Pandas: how to add column with Booleans (True/False) based on duplicates in one column and group index in another column

Question:

I have the following dataframe:

d_test = {
    'name' : ['bob', 'rob', 'dan', 'steeve', 'carl', 'steeve', 'dan', 'carl', 'bob'],
    'group': [1, 4, 3, 3, 2, 3, 2, 1, 5]
}
df_test = pd.DataFrame(d_test)

I am looking for a way to add column duplicate with True/False for each entry. I want True only for case if there is more than one duplicate from ‘name’ that belongs more than one ‘group’ number. Here is expected output:

    name    group   duplicate
0   bob     1       True
1   rob     4       False
2   dan     3       True
3   steeve  3       False
4   carl    2       True
5   steeve  3       False
6   dan     2       True
7   carl    1       True
8   bob     5       True

For example above, row 0 has True in duplicate because name is the same as in row 8 and group number is different (1 and 5). Row 3 has False in duplicate because no duplicates exist outside of the same group 3.

Asked By: illuminato

||

Answers:

There seems to be something wrong in your example, Perhaps what you want is possible with following code

result = df_test.groupby('name')['group'].transform(lambda x: x.nunique() > 1)
df_test.assign(duplicated=result)

output(df_test.assign(duplicated=result):
:

    name    group   duplicated
0   bob     1       True
1   rob     4       False
2   dan     3       True
3   steeve  3       False
4   carl    2       True
5   steeve  3       False
6   dan     2       True
7   carl    1       True
8   bob     5       True
Answered By: Panda Kim
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.