Pandas: how to add column with Booleans (True/False) based on duplicates in one column and group index in another column
Question:
I have the following dataframe:
d_test = {
'name' : ['bob', 'rob', 'dan', 'steeve', 'carl', 'steeve', 'dan', 'carl', 'bob'],
'group': [1, 4, 3, 3, 2, 3, 2, 1, 5]
}
df_test = pd.DataFrame(d_test)
I am looking for a way to add column duplicate
with True
/False
for each entry. I want True
only for case if there is more than one duplicate from ‘name’ that belongs more than one ‘group’ number. Here is expected output:
name group duplicate
0 bob 1 True
1 rob 4 False
2 dan 3 True
3 steeve 3 False
4 carl 2 True
5 steeve 3 False
6 dan 2 True
7 carl 1 True
8 bob 5 True
For example above, row 0
has True
in duplicate
because name
is the same as in row 8
and group
number is different (1
and 5
). Row 3
has False
in duplicate
because no duplicates exist outside of the same group 3
.
Answers:
There seems to be something wrong in your example, Perhaps what you want is possible with following code
result = df_test.groupby('name')['group'].transform(lambda x: x.nunique() > 1)
df_test.assign(duplicated=result)
output(df_test.assign(duplicated=result
):
:
name group duplicated
0 bob 1 True
1 rob 4 False
2 dan 3 True
3 steeve 3 False
4 carl 2 True
5 steeve 3 False
6 dan 2 True
7 carl 1 True
8 bob 5 True
I have the following dataframe:
d_test = {
'name' : ['bob', 'rob', 'dan', 'steeve', 'carl', 'steeve', 'dan', 'carl', 'bob'],
'group': [1, 4, 3, 3, 2, 3, 2, 1, 5]
}
df_test = pd.DataFrame(d_test)
I am looking for a way to add column duplicate
with True
/False
for each entry. I want True
only for case if there is more than one duplicate from ‘name’ that belongs more than one ‘group’ number. Here is expected output:
name group duplicate
0 bob 1 True
1 rob 4 False
2 dan 3 True
3 steeve 3 False
4 carl 2 True
5 steeve 3 False
6 dan 2 True
7 carl 1 True
8 bob 5 True
For example above, row 0
has True
in duplicate
because name
is the same as in row 8
and group
number is different (1
and 5
). Row 3
has False
in duplicate
because no duplicates exist outside of the same group 3
.
There seems to be something wrong in your example, Perhaps what you want is possible with following code
result = df_test.groupby('name')['group'].transform(lambda x: x.nunique() > 1)
df_test.assign(duplicated=result)
output(df_test.assign(duplicated=result
):
:
name group duplicated
0 bob 1 True
1 rob 4 False
2 dan 3 True
3 steeve 3 False
4 carl 2 True
5 steeve 3 False
6 dan 2 True
7 carl 1 True
8 bob 5 True