Displaying duplicates in pandas

Question

I would like to display the duplicates of a dataframe in order to get a better understanding. I would like to groupby the duplicated rows

This example hopefully clarifies what I want to do. Assume we have given the dataframe below


CC BF FA WC Strength
1  2  3  4   1
2  3  4  5   6
1  2  3  4   8
1  2  3  4   4
2  3  4  5   7

Here rows 1,3,4 and 2,5 are duplicates after removing Strength. I would like to get a new dataframe that displays

CC BF FA WC Strength_min Strength_max Count
1  2  3  4  1            8             3
2  3  4  5  6            7             2

Asked By: samabu

||

Source

Answer 1

You need a custom groupby.agg with the output from Index.difference as grouper:

(df.groupby(list(df.columns.difference(['Strength'], sort=False)))['Strength']
   .agg(**{'Strength_min': 'min', 'Strength_max': 'max', 'Count': 'count'})
   .reset_index()
)

Output:

   CC  BF  FA  WC  Strength_min  Strength_max  Count
0   1   2   3   4             1             8      3
1   2   3   4   5             6             7      2

Answered By: mozway

Displaying duplicates in pandas

Question:

Answers: