iterating one dataframe and using multiple columns of it as criteria to find value in another dataframe

Question:

I have two dataframes Samples and Events

Samples:

 ID     Sample_Date  

0 500001 2021-09-28

1 500002 2021-09-29

2 500003 2021-10-15

Events:

  ID     Event_Date  Event   

0 500001 2021-09-20 BN

1 500001 2021-10-05 BN

2 500003 2021-10-09 AS

a desired output example would be:

 ID     Sample_Date  last_Event_to_sample

0 500001 2021-09-28 2021-09-20

1 500002 2021-09-29 NaN

2 500003 2021-10-15 2021-10-09

Samples has a one to many relationship to Events.

Basicly what i wanted to do is to iterate through each row of the Samples Dataframe and use the ID and the Date as a condition to search for the next smaller date in the Events Dataframe for this ID, in order to create a new column in Samples ‘last_Event_to_Sample’ .

After two days researching i cannot figure out how to approach this. This is the last step to be finally able to create my plots. I would really appreachiate it ,if someone can help me out!

Thank you a lot in advance!

Asked By: Marco Mikuš

||

Answers:

IIUC, you can use pandas.merge_asof :

df_samples["Sample_Date"] = pd.to_datetime(df_samples["Sample_Date"])
df_events["Event_Date"] = pd.to_datetime(df_events["Event_Date"])

out = (
        pd.merge_asof(df_samples, df_events,
                      left_on="Sample_Date", right_on="Event_Date",
                      by="ID", direction="backward")
       )

# Output :

print(out)

       ID Sample_Date Event_Date Event
0  500001  2021-09-28 2021-09-20    BN
1  500002  2021-09-29        NaT   NaN
2  500003  2021-10-15 2021-10-09    AS

With direction="nearest", we get this :

       ID Sample_Date Event_Date Event
0  500001  2021-09-28 2021-10-05    BN
1  500002  2021-09-29        NaT   NaN
2  500003  2021-10-15 2021-10-09    AS

And if needed, you can add a tolerance with tolerance=pd.Timedelta("7d") (for e.g) as a keyword :

       ID Sample_Date Event_Date Event
0  500001  2021-09-28        NaT   NaN
1  500002  2021-09-29        NaT   NaN
2  500003  2021-10-15 2021-10-09    AS

NB : Both dataframes need to be sorted on keys before the merge_asof.

Answered By: abokey
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.