Pandas drop duplicates based on one group and keep the last value

Question:

I have a dataframe:

import pandas as pd

data = pd.DataFrame({"col1": ["a", "a", "a", "a", "a", "a"],
                     "col2": [0,0,0,1,1, 1],
                     "col3": [1,2,3,4,5, 6]})

data


  col1  col2    col3
0   a   0   1
1   a   0   2
2   a   0   3
3   a   1   4
4   a   1   5
5   a   1   6

I’m trying to remove the duplicates based on col2 == 1 and keep the last entry

Using the below code I was able to keep the first and drop others.

data[~(data.duplicated(["col2"]) & data.col2.eq(1))]
col1    col2    col3
0   a   0   1
1   a   0   2
2   a   0   3
3   a   1   4

How to remove duplicates based on one category in a column and keep the last entry?

Desired Output

  col1  col2    col3
0   a   0   1
1   a   0   2
2   a   0   3
3   a   1   6
Asked By: Ailurophile

||

Answers:

Use boolean indexing with help of groupby.cumcount (or duplicated):

# is the row not a 1 in col2?
m1 = data['col2'].ne(1)
# is the row the last of the group?
m2 = data.groupby('col2').cumcount(ascending=False).eq(0)
# or
# m2 = ~data['col2'].duplicated(keep='last')

# keep rows matching either condition
out = data[m1|m2]

Reversed logic:

# is the group a 1?
m1 = data['col2'].eq(1)
# is the row NOT the last one?
m2 = data['col2'].duplicated(keep='last')

# drop the rows matching both conditions
out = data[~(m1&m2)]

Output:

  col1  col2  col3
0    a     0     1
1    a     0     2
2    a     0     3
5    a     1     6
Answered By: mozway

You can also concat the last row of the removed to the kept dataframe.

m = df['col2'].eq(1)
out = pd.concat([df[~m], df[m].iloc[[-1]]]).sort_index()
print(out)

  col1  col2  col3
0    a     0     1
1    a     0     2
2    a     0     3
5    a     1     6
Answered By: Ynjxsjmh