Pandas: Merge on exact ID and closest date

Question:

I’m trying to merge two Pandas dataframes on two columns. One column has a unique identifier that could be used to simply .merge() the two dataframes. However, the second column merge would actually use .merge_asof() because it would need to find the closest date, not an exact date match.

There is a similar question here: Pandas Merge on Name and Closest Date, but it was asked and answered nearly three years ago, and merge_asof() is a much newer addition.

I asked a similar here question a couple months ago, but the solution only needed to use merge_asof() without any exact matches required.

In the interest of including some code, it would look something like this:

df = pd.merge_asof(df1, df2, left_on=['ID','date_time'], right_on=['ID','date_time'])

where the ID‘s will match exactly, but the date_time‘s will be “near matches”.

Any help is greatly appreciated.

Asked By: elPastor

||

Answers:

Consider merging first on the ID and then run a DataFrame.apply to return highest date_time from first dataframe on matched IDs less than the current row date_time from second dataframe.

# INITIAL MERGE (CROSS-PRODUCT OF ALL ID PAIRINGS)
mdf = pd.merge(df1, df2, on=['ID'])

def f(row):
    col = mdf[(mdf['ID'] == row['ID']) & 
              (mdf['date_time_x'] < row['date_time_y'])]['date_time_x'].max()
    return col

# FILTER BY MATCHED DATES TO CONDITIONAL MAX
mdf = mdf[mdf['date_time_x'] ==  mdf.apply(f, axis=1)].reset_index(drop=True)

This assumes you want to keep all rows of df2 (i.e., right join). Simply flip _x / _y suffixes for left join.

Answered By: Parfait

The current solution would work on a small dataset but if you have hundreds of rows… I’m afraid not.

So, what you want to do is as follows:

df = pd.merge_asof(df1, df2, on = 'date_time', by = 'ID', direction = 'nearest')
Answered By: ga1996
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.