Drop duplicates from a panda dataframe based on other column values

Question:

Dataframe which I am using is as below:

Name    NoOfTrans   Avg_pass_time    Cons.Error            RunCounts
Jan     0                            Failed:abcd           4
Jan                                                        4
Jan                                                        4
Jan                                                        4
May     2                            Failed:abcFailed:cde  5
May                                                        5
May                  1200                                  5
May                  1200                                  5
May                                                        5

I need to remove the duplicate from "Name", "Avg_pass_time" and "RunCounts" columns group by the "Name" column so that the output is as below:

Name    NoOfTrans   Avg_pass_time    Cons.Error            RunCounts
Jan     0                            Failed:abcd           4
May     2           1200             Failed:abcFailed:cde  5

Any guide will be usefull

Asked By: trainset

||

Answers:

You can select a subset of rows that will be used to drop the duplicates:

df = df.drop_duplicates(subset=['Name','Avg_pass_time','RunCounts'])

Untested but this should work.

Answered By: Jeremy Savage

If per groups are only empty strings or duplicated values use:

df = df.replace('',np.nan).groupby('Name', as_index=False).first().fillna('')
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.