Pandas – Check if two dates are the same with determined conditions (2 more variables) and perform actions

Question:

new to pandas, I have a challenge regard checking values and performing multiple actions based on four variables (reception_date,final_date,Status,ID) the problem has the following table:

      id             user_email reception_date   end_date    status
0  42872     [email protected]      3/30/2022  3/30/2022  Accepted
1  42872   [email protected]       3/1/2022   3/4/2022  Returned
2  42872  [email protected]       3/7/2022  3/30/2022  In Study
3  99999                [email protected]       3/6/2022  3/28/2022  In Study
4  42872           [email protected]      3/23/2022  3/25/2022  In Study
5  99999                 [email protected]      3/28/2022   4/5/2022  Accepted
6  78787                 [email protected]      3/15/2022  3/16/2022  In Study

Firstly, it is required to perform operations for the same ID, (in this example only few are found, however, the database consists of more than 50,000 data), check if in the Status column contains "Accepted", once verified this, check if the "end_date" of the status "In Study" is equal to the "reception_date" of the status "Accepted", if this condition is true, change the status from "In Study" to "Accepted", the expected output would be as follows:

      id             user_email reception_date   end_date    status
0  42872     [email protected]      3/30/2022  3/30/2022  Accepted
1  42872   [email protected]       3/1/2022   3/4/2022  Returned
2  42872  [email protected]       3/7/2022  3/30/2022  Accepted
3  99999                [email protected]       3/6/2022  3/28/2022  Accepted
4  42872           [email protected]      3/23/2022  3/25/2022  In Study
5  99999                 [email protected]      3/28/2022   4/5/2022  Accepted
6  78787                 [email protected]      3/15/2022  3/16/2022  In Study

Since I’m relative new to pandas I’ve tried several methods, one of them being my last attemp,using

Test=Test.merge(Test.loc[Test.status== 'Accepted'], how='left', left_on=['id'], right_on=['id'], suffixes=("", "_y"))
.assign(status=lambda x:np.where((x.end_date_y==x.reception_date) & (x.status== 'In Study'), 'Accepted',x.status))

However the result of this wasn’t the expected output, I hope you can help me with this, it’s driving me crazy.

Asked By: Andrés HK

||

Answers:

You can use:

# which rows are Accepted?
m1 = df['status'].eq('Accepted')

# which rows are In Study?
m2 = df['status'].eq('In Study')

# get In Study indices that also have an Accepted
# on the same date
to_change = (df[m2]
 .reset_index()
 .merge(df[m1],
        left_on=['id', 'end_date'],
        right_on=['id', 'reception_date'])
 ['index']
)
# [2, 3]

# update in place
df.loc[to_change, 'status'] = 'Accepted'

output:

      id             user_email reception_date   end_date    status
0  42872     [email protected]      3/30/2022  3/30/2022  Accepted
1  42872   [email protected]       3/1/2022   3/4/2022  Returned
2  42872  [email protected]       3/7/2022  3/30/2022  Accepted
3  99999                [email protected]       3/6/2022  3/28/2022  Accepted
4  42872           [email protected]      3/23/2022  3/25/2022  In Study
5  99999                 [email protected]      3/28/2022   4/5/2022  Accepted
6  78787                 [email protected]      3/15/2022  3/16/2022  In Study
Answered By: mozway