Python pandas two table match to find latest date

Question:

I want to do some matching in pandas like Vlookup in Excel. According to some conditions in Table1, find the latest date in Table2:

Table 1:

Name  Threshold1   Threshold2
A     9            8
B     14           13

Table 2:

Date   Name   Value   
1/1    A      10
1/2    A      9
1/3    A      9
1/4    A      8
1/5    A      8
1/1    B      15
1/2    B      14
1/3    B      14
1/4    B      13
1/5    B      13

The desired table is like:

Name  Threshold1   Threshold1_Date   Threshold2   Threshold2_Date
A     9            1/3               8            1/5
B     14           1/3               13           1/5

Thanks in advance!

Asked By: Elaine Yang

||

Answers:

Does this work?

(df_out := df1.melt('Name', value_name='Value')
   .merge(df2, on=['Name', 'Value'])
   .sort_values('Date')
   .drop_duplicates(['Name', 'variable'], keep='last')
   .set_index(['Name', 'variable'])
   .unstack().sort_index(level=1, axis=1))
.set_axis(df_out.columns.map('_'.join), axis=1).reset_index()

Output:

  Name Date_Threshold1  Value_Threshold1 Date_Threshold2  Value_Threshold2
0    A             1/3                 9             1/5                 8
1    B             1/3                14             1/5                13
Answered By: Scott Boston

Code

# assuming dataframe is already sorted on `date`
# drop the duplicates per Name and Value keeping the max date
cols = ['Name', 'Value']
s = df2.drop_duplicates(cols, keep='last').set_index(cols)['Date']

# for each threshold column use multindex.map to substitute 
# values from df2 based on matching Name and Threshold value
for c in df1.filter(like='Threshold'):
    df1[c + '_date'] = df1.set_index(['Name', c]).index.map(s)

Result

  Name  Threshold1  Threshold2 Threshold1_date Threshold2_date
0    A           9           8             1/3             1/5
1    B          14          13             1/3             1/5
Answered By: Shubham Sharma

Here’s a way to do what your question asks:

latestDtByNameVal = df2.groupby(['Name','Value']).last()
res = df1.assign(**( df1.set_index('Name').pipe(lambda d:
    {f'{col}_Date': d[[col]].rename(columns={col:'Value'})
        .set_index('Value', append=True)
        .pipe(lambda d:latestDtByNameVal.Date[d.index].to_numpy()) 
    for col in d.columns}) ))

If you want the result columns to be ordered as in your question, you can add one of the following:

# use numpy ravel:
res = res[np.ravel([[x + s for x in df1.columns if x != 'Name'] for s in ['','_Date']], order='F')]

# ... or, use itertools:
from itertools import chain
res = res[['Name'] + list(chain.from_iterable([[col, f'{col}_Date'] for col in df1.drop(columns='Name').columns]))]

Output:

  Name  Threshold1 Threshold1_Date  Threshold2 Threshold2_Date
0    A           9             1/3           8             1/5
1    B          14             1/3          13             1/5
Answered By: constantstranger