merge two data frames by comparing two date columns with a condition and comparing two id columns of two different data frames
Question:
I have two data frames df1 and df2
df1:
l_id
c_id
dt
1a
c1
2023-01-01
1b
c1
2021-02-20
1c
c2
2022-11-25
1d
c2
2022-01-01
1d
c2
2022-03-01
1e
c3
2022-04-08
1f
c4
2022-06-12
and
df2:
c_id
r_dt
c1
2023-01-01
c1
2021-02-14
c2
2022-11-25
c2
2022-02-28
c5
2022-11-12
c4
2022-06-06
what I want to do is I want exact match of c_id of the two data frame and then and if ‘dt’ of df1 minas ‘r_dt’ falls between less than 7 days, then only two data frame merges. I want the joining in "inner" ways.
My expected result will exactly look like:
result_df:
l_id
c_id
dt
1a
c1
2023-01-01
1b
c1
2021-02-20
1c
c2
2022-11-25
1d
c2
2022-01-01
1d
c2
2022-03-01
1f
c4
2022-06-12
I tried it like following:
import pandas as pd
df1=pd.read_csv("my_csv1.csv")
df2=pd.read_csv("my_csv2.csv")
#merging
merge_df = pd.merge(df1,df2,how='inner',left_on=['c_id','dt'], right_on = ['c_id','r_dt'])
but this does not give me exact result. It just matches exact date, it does not compare whether day difference is less or equal 7 days or not.
Please help.
Answers:
do it in two steps: merge on id first and then filter, like
merge_df = pd.merge(df1,df2,how='inner',left_on=['c_id'], right_on = ['c_id'])
merge_df = merge_df[(merge_df['dt'] - merge_df['r_dt']).dt.days.abs() < 7]
I have two data frames df1 and df2
df1:
l_id | c_id | dt |
---|---|---|
1a | c1 | 2023-01-01 |
1b | c1 | 2021-02-20 |
1c | c2 | 2022-11-25 |
1d | c2 | 2022-01-01 |
1d | c2 | 2022-03-01 |
1e | c3 | 2022-04-08 |
1f | c4 | 2022-06-12 |
and
df2:
c_id | r_dt |
---|---|
c1 | 2023-01-01 |
c1 | 2021-02-14 |
c2 | 2022-11-25 |
c2 | 2022-02-28 |
c5 | 2022-11-12 |
c4 | 2022-06-06 |
what I want to do is I want exact match of c_id of the two data frame and then and if ‘dt’ of df1 minas ‘r_dt’ falls between less than 7 days, then only two data frame merges. I want the joining in "inner" ways.
My expected result will exactly look like:
result_df:
l_id | c_id | dt |
---|---|---|
1a | c1 | 2023-01-01 |
1b | c1 | 2021-02-20 |
1c | c2 | 2022-11-25 |
1d | c2 | 2022-01-01 |
1d | c2 | 2022-03-01 |
1f | c4 | 2022-06-12 |
I tried it like following:
import pandas as pd
df1=pd.read_csv("my_csv1.csv")
df2=pd.read_csv("my_csv2.csv")
#merging
merge_df = pd.merge(df1,df2,how='inner',left_on=['c_id','dt'], right_on = ['c_id','r_dt'])
but this does not give me exact result. It just matches exact date, it does not compare whether day difference is less or equal 7 days or not.
Please help.
do it in two steps: merge on id first and then filter, like
merge_df = pd.merge(df1,df2,how='inner',left_on=['c_id'], right_on = ['c_id'])
merge_df = merge_df[(merge_df['dt'] - merge_df['r_dt']).dt.days.abs() < 7]