How to identify rows that contain duplicates in multiple columns and specific values in others

Question

I have a dataframe that contains numerous columns and rows where I want to identify rows that are duplicates in multiple columns and in another column one duplicate contains a specific value and the other duplicate contains another specific value.

Example

ID    FDAT     LACT    DATE      EVENT
1     1/1/2022  1    31/1/2022   FRESH
1     1/1/2022  1    15/2/2022   LUT
1     1/1/2022  1    15/3/2022   BRED
1     1/1/2022  1    15/3/2022   OS
1     1/1/2022  1    15/3/2022   PREG
1     1/1/2022  1    30/3/2022   OS
1     1/1/2022  1    30/3/2022   PREG

I can check for duplicates and a specific value with the following code

df.loc[(df.duplicated(['ID', 'LACT', 'FDAT', 'DATE'])) & (df['EVENT']=='PREG')]

The problem is that I only want to include rows where one of the duplicate values on the same day is "BRED". In the example above this would be true on the 15th of March and false on the 30th of March. I would like to delete the "PREG" row on the 15th of March

I am looking for a way to make the statement above conditional of EVENT = ‘PREG’ and other EVENT from duplicate is ‘BRED’

The result I am looking to achieve is

ID    FDAT     LACT    DATE      EVENT
1     1/1/2022  1    31/1/2022   FRESH
1     1/1/2022  1    15/2/2022   LUT
1     1/1/2022  1    15/3/2022   BRED
1     1/1/2022  1    15/3/2022   OS
1     1/1/2022  1    30/3/2022   OS
1     1/1/2022  1    30/3/2022   PREG

Asked By: JohnH

||

Source

Answer 1

In your sophisticated case you need to group by columns 'ID', 'LACT', 'FDAT', 'DATE' and apply additional filters by a certain events:

events = {'BRED', 'PREG'}
df.groupby(['ID', 'LACT', 'FDAT', 'DATE'], sort=False).apply(
    lambda x: x[x['EVENT'].ne('PREG')] if set(x['EVENT']).issuperset(events)
    else x).reset_index(drop=True)

   ID      FDAT  LACT       DATE  EVENT
0   1  1/1/2022     1  31/1/2022  FRESH
1   1  1/1/2022     1  15/2/2022    LUT
2   1  1/1/2022     1  15/3/2022   BRED
3   1  1/1/2022     1  15/3/2022     OS
4   1  1/1/2022     1  30/3/2022     OS
5   1  1/1/2022     1  30/3/2022   PREG

Answered By: RomanPerekhrest

How to identify rows that contain duplicates in multiple columns and specific values in others

Question:

Answers: