Check duplicated indices for each subset of values in pandas dataframe
Question:
I have the following dataframe:
import pandas as pd
df_test = pd.DataFrame(data=[['AP1', 'House1'],
['AP1', 'House1'],
['AP2', 'House1'],
['AP3', 'House2'],
['AP4','House2'],
['AP5', 'House2']],
columns=['AP', 'House'],
index=[0, 1, 2, 0, 1, 1])
I need to check at each subset of values of a column and see if there are duplicated indices. For example, in column House
, we have three entries of House1
and no duplicated indices. But for entry House2
we have one duplicated index 1
.
I have tried this:
print(f'{df_test.index.duplicated().sum()} repeated entries')
But this gives 3
duplicated entries, since it does not consider each value of the column separately.
Answers:
A possible solution:
print(df_test.reset_index().duplicated(['index', 'AP']).sum())
print(df_test.reset_index().duplicated(['index', 'House']).sum())
Output:
0
1
You can use:
>>> (df_test.reset_index(names='Dups')
.groupby('House', as_index=False)['Dups']
.agg(lambda x: x.duplicated().sum()))
House Dups
0 House1 0
1 House2 1
I have the following dataframe:
import pandas as pd
df_test = pd.DataFrame(data=[['AP1', 'House1'],
['AP1', 'House1'],
['AP2', 'House1'],
['AP3', 'House2'],
['AP4','House2'],
['AP5', 'House2']],
columns=['AP', 'House'],
index=[0, 1, 2, 0, 1, 1])
I need to check at each subset of values of a column and see if there are duplicated indices. For example, in column House
, we have three entries of House1
and no duplicated indices. But for entry House2
we have one duplicated index 1
.
I have tried this:
print(f'{df_test.index.duplicated().sum()} repeated entries')
But this gives 3
duplicated entries, since it does not consider each value of the column separately.
A possible solution:
print(df_test.reset_index().duplicated(['index', 'AP']).sum())
print(df_test.reset_index().duplicated(['index', 'House']).sum())
Output:
0
1
You can use:
>>> (df_test.reset_index(names='Dups')
.groupby('House', as_index=False)['Dups']
.agg(lambda x: x.duplicated().sum()))
House Dups
0 House1 0
1 House2 1