merge two data frames by comparing two date columns with a condition and comparing two id columns of two different data frames

Question:

I have two data frames df1 and df2

df1:

l_id c_id dt
1a c1 2023-01-01
1b c1 2021-02-20
1c c2 2022-11-25
1d c2 2022-01-01
1d c2 2022-03-01
1e c3 2022-04-08
1f c4 2022-06-12

and

df2:

c_id r_dt
c1 2023-01-01
c1 2021-02-14
c2 2022-11-25
c2 2022-02-28
c5 2022-11-12
c4 2022-06-06

what I want to do is I want exact match of c_id of the two data frame and then and if ‘dt’ of df1 minas ‘r_dt’ falls between less than 7 days, then only two data frame merges. I want the joining in "inner" ways.

My expected result will exactly look like:
result_df:

l_id c_id dt
1a c1 2023-01-01
1b c1 2021-02-20
1c c2 2022-11-25
1d c2 2022-01-01
1d c2 2022-03-01
1f c4 2022-06-12

I tried it like following:

import pandas as pd
df1=pd.read_csv("my_csv1.csv")
df2=pd.read_csv("my_csv2.csv")

#merging
merge_df = pd.merge(df1,df2,how='inner',left_on=['c_id','dt'], right_on = ['c_id','r_dt'])


but this does not give me exact result. It just matches exact date, it does not compare whether day difference is less or equal 7 days or not.
Please help.

Asked By: Sur

||

Answers:

do it in two steps: merge on id first and then filter, like

merge_df = pd.merge(df1,df2,how='inner',left_on=['c_id'], right_on = ['c_id'])
merge_df = merge_df[(merge_df['dt'] - merge_df['r_dt']).dt.days.abs() < 7]
Answered By: Tarifazo