how to lookup between two dataframes and update the value in another df
Question:
I have a dataframe df1
like below –
|email_id| date |
|[email protected] | ['2022-04-09'] |
|[email protected] | [nan]
|[email protected] | ['2022-09-21','2022-03-09'] |
|[email protected] | [nan, '2022-03-29'] |
|[email protected] | [nan] |
|[email protected] | [nan,'2022-09-01']
Another df df2
–
|email_id| status |
|[email protected] | 0 |
|[email protected] | 0 |
|[email protected] | 0 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |
How can I lookup email_id from df2 in df1 and update the status in df2?
If we have the date values present in df1 date column , status for that email_id should be 0, and if we have any nan values present, the status should be 1. If some email_id from df2 doesn’t match in df1 , will keep the status as same.
Expected output of df2 –
|email_id| status |
|[email protected] | 1 |
|[email protected] | 0 |
|[email protected] | 1 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |
Please help me out. Thanks in advance!
Answers:
First use DataFrame.explode
for column from lists, then create compare for missing values with aggregate max
for mapping Series, use Series.map
with replace non matched values to original column df2['status']
:
df = df1.explode('date')
s = df['date'].isna().astype(int).groupby(df['email_id'].str.lower()).max()
print (s)
email_id
[email protected] 1
[email protected] 0
[email protected] 1
[email protected] 1
[email protected] 1
Name: date, dtype: int32
df2['status'] = df2['email_id'].str.lower().map(s).fillna(df2['status']).astype(int)
print (df2)
email_id status
0 [email protected] 1
1 [email protected] 0
2 [email protected] 1
3 [email protected] 3
4 [email protected] 2
5 [email protected] 1
I have a dataframe df1
like below –
|email_id| date |
|[email protected] | ['2022-04-09'] |
|[email protected] | [nan]
|[email protected] | ['2022-09-21','2022-03-09'] |
|[email protected] | [nan, '2022-03-29'] |
|[email protected] | [nan] |
|[email protected] | [nan,'2022-09-01']
Another df df2
–
|email_id| status |
|[email protected] | 0 |
|[email protected] | 0 |
|[email protected] | 0 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |
How can I lookup email_id from df2 in df1 and update the status in df2?
If we have the date values present in df1 date column , status for that email_id should be 0, and if we have any nan values present, the status should be 1. If some email_id from df2 doesn’t match in df1 , will keep the status as same.
Expected output of df2 –
|email_id| status |
|[email protected] | 1 |
|[email protected] | 0 |
|[email protected] | 1 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |
Please help me out. Thanks in advance!
First use DataFrame.explode
for column from lists, then create compare for missing values with aggregate max
for mapping Series, use Series.map
with replace non matched values to original column df2['status']
:
df = df1.explode('date')
s = df['date'].isna().astype(int).groupby(df['email_id'].str.lower()).max()
print (s)
email_id
[email protected] 1
[email protected] 0
[email protected] 1
[email protected] 1
[email protected] 1
Name: date, dtype: int32
df2['status'] = df2['email_id'].str.lower().map(s).fillna(df2['status']).astype(int)
print (df2)
email_id status
0 [email protected] 1
1 [email protected] 0
2 [email protected] 1
3 [email protected] 3
4 [email protected] 2
5 [email protected] 1