How to find one-to-one relation in between two columns with duplicates?

Question

I have a dataset like the following:

import pandas as pd

dict1 = {
    "idx": [1, 2, 9, 1, 1, 6, 1, 3, 2],
    "value": ["a1", "b1", "c1", "t1", "t1", "f1", "r1", "l1", "b1"]
}

df = pd.DataFrame(dict1)

Here, a1, t1, r1 always occur with 1. To verify this, I used the following code,

def one_to_one(df, col1, col2):
    first = df.groupby(col1)[col2].count().max()
    second = df.groupby(col2)[col1].count().max()
    return first + second == 2

one_to_one(df, 'idx', 'value')

But it returned None even though a1, t1, r1 always occur with 1 and others do the same. How to verify that, a1 for example, only occurs with idx = 1?

Asked By: Gravity Mass

||

Source

Answer 1

You can just drop_duplicates based idx + value combination. If each value has only one idx, then the result should contain the same number of records as the number of unique values in value column:

len(df[['idx', 'value']].drop_duplicates()) == df['value'].nunique()
# True

Answered By: Psidom

How to find one-to-one relation in between two columns with duplicates?

Question:

Answers: