How to aggregate DataFrame and drop duplicates based on values in two columns in Python Pandas?

Question:

I have DataFrame in Python Pandas like below:

ID  | COL1| COL2 | COL3
----------|------|------
123 | XXX | 0    | 1
123 | XXX | 1    | 1
444 | ABC | 1    | 1
444 | ABC | 1    | 1 
555 | PPP | 0    | 0

And I need to drop duplicates in above DF in that way:

  • if in COL2 or COL3 is at least once ‘1’ then should be 1 in these columns for ID (nevermind how often he had 0 in mentioned columns)
  • rest of columns should still be in output
  • In COL1 the is no duplicates per ID

So as a result I need output like below (I have many more columns so in output I need to have not only ID, COL2, COL3, but ID, COL1, COL2, COL3)

ID  | COL1| COL2 | COL3
----|-----|------|-----
123 | XXX | 1    | 1
444 | ABC | 1    | 1
555 | PPP | 0    | 0

How can I do that in Python Pandas ?

Asked By: unbik

||

Answers:

Use a groupby.max:

out = df.groupby(['ID', 'COL1'], as_index=False).max()

output:

    ID COL1  COL2  COL3
0  123  XXX     1     1
1  444  ABC     1     1
2  555  PPP     0     0
Answered By: mozway