how to merge next nearest data points using a common datetime stamp value for a group?

Question:

I have two data frames like the samples given below:

df1 = pd.DataFrame({"items":["i1", "i1", "i1", "i2","i2", "i2"], "dates":["09-Nov-2022", "10-Aug-2022", "27-May-2022", "20-Oct-2022", "01-Nov-2022","27-Jul-2022"]})
df2 = pd.DataFrame({"items": ["i1", "i1", "i1", "i1", "i2", "i2"], "prod_mmmyyyy": ["Sep 2022", "Jun 2022", "Mar 2022", "Dec 2021", "Sep 2022", "Jun 2022"]})

Here I wanted to map df1 into df2 with the next closest date of df2.prod_mmmyyyy from df1.dates column for each category on items.
The expected result would be like this below:

result = pd.DataFrame({"items":["i1", "i1", "i1", "i1", "i2", "i2"], 
                        "prod_mmmyyyy": ["Sep 2022", "Jun 2022", "Mar 2022", "Dec 2021", "Sep 2022", "Jun 2022"],
                        "mapped_date": ["09-Nov-2022", "10-Aug-2022", "27-May-2022", "27-May-2022", "20-Oct-2022", "27-Jul-2022"]})

I have tried to convert the prod_mmmyyyy to month end date then group by on the items but after that what should be done is being very difficult for me.

Thanks in advance for any help provided.

Asked By: durjoy

||

Answers:

Try this:

df1['dates_1'] = df1['dates']

df_out = pd.merge_asof(
    df2.assign(dates=pd.to_datetime(df2["prod_mmmyyyy"], format="%b %Y")).sort_values(
        "dates"
    ),
    df1.assign(dates=pd.to_datetime(df1["dates"], format="%d-%b-%Y")).sort_values(
        "dates"
    ),
    on="dates",
    by="items",
    direction='nearest'
)

df_out

Output:

  items prod_mmmyyyy      dates      dates_1
0    i1     Dec 2021 2021-12-01  27-May-2022
1    i1     Mar 2022 2022-03-01  27-May-2022
2    i1     Jun 2022 2022-06-01  27-May-2022
3    i2     Jun 2022 2022-06-01  27-Jul-2022
4    i1     Sep 2022 2022-09-01  10-Aug-2022
5    i2     Sep 2022 2022-09-01  27-Jul-2022
Answered By: Scott Boston