Remove the rows of dataframe based on date and flag condition in pandas

Question:

I have a dataframe

df = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","14-02-2022","D","FALSE"],
                   ["A","14-02-2022","E","FALSE"],["A","16-02-2022","A","TRUE"],["A","16-02-2022","F","FALSE"],
                   ["A","17-02-2022","G","FALSE"],["A","17-02-2022","H","FALSE"],["A","18-02-2022","I","FALSE"],
                   ["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1   date     id2  flag
A   13-02-2022  B   FALSE
A   13-02-2022  C   FALSE
A   14-02-2022  D   FALSE
A   14-02-2022  E   FALSE
A   16-02-2022  A   TRUE
A   16-02-2022  F   FALSE
A   17-02-2022  G   FALSE
A   17-02-2022  H   FALSE
A   18-02-2022  I   FALSE
A   18-02-2022  J   FALSE

I want to remove all the rows of previous working day, next working day and the day where flag is TRUE.

For example here 16th Feb flag is TRUE, so remove all the rows of previous working day 14th Feb, next working day 17th Feb and 16th Feb. If TRUE is in last day of month 28th Feb where next working day is not there, then remove the rows of TRUE flag day and previous working day only.

Expected Output:

df_out = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","18-02-2022","I","FALSE"],
                       ["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1   date     id2  flag
A   13-02-2022  B   FALSE
A   13-02-2022  C   FALSE
A   18-02-2022  I   FALSE
A   18-02-2022  J   FALSE

How to do it?

Asked By: Chethan

||

Answers:

You can use boolean indexing:

# ensure boolean and datetime
df['flag'] = df['flag'].eq('TRUE')
df['date'] = pd.to_datetime(df['date'], dayfirst=True)

bday = pd.offsets.BusinessDay(1)

drop = pd.concat([dates+bday, dates-bday])

out = df[~(df['date'].isin(drop) | df['flag'])]

Output:

  id1       date id2   flag
0   A 2022-02-13   B  False
1   A 2022-02-13   C  False
2   A 2022-02-14   D  False
3   A 2022-02-14   E  False
5   A 2022-02-16   F  False
8   A 2022-02-18   I  False
9   A 2022-02-18   J  False
Answered By: mozway

You can try to create a filter data frame and select everything which is not in it:

df['date'] = pd.to_datetime(df['date'], format="%d-%m-%Y")

dates = df[df.flag == 'TRUE']['date']
to_drop = pd.concat([dates, dates + pd.offsets.BusinessDay(1), dates - pd.offsets.BusinessDay(1)])
df_out = df[~df['date'].isin(to_drop)]
df_out
Answered By: Guru Stron