Python: how to multiple merge between two data frames when number of variables are unknown

Question

My example master dataset is the following:

df1 = pd.DataFrame(
    data=
    [['dfs','cat', 'giraffe', 'fish'],
     ['axs','dog', 'pig', 'bird'],
     ['fgd','cow', 'lion', 'bull'],
     ['rty','mouse', 'elephant', 'sheep'],
     ['pwe', 'fox', 'tiger', 'horse']],
    columns=['othervar', 'makevar0', 'makevar1', 'makevar2']
)
print(df1)

The other dataset is as follows:

df2 = pd.DataFrame(
    data=
    [['cat', 34],
     ['pig', 23],
     ['cow', 45],
     ['elephant', 86],
     ['horse', 90]],
    columns=['varx', 'targetvar']
)
print(df2)

I want to merge df1 and df2 like follows:

merged = pd.merge(df1, df2[['varx', 'targetvar']], left_on='makevar0',
                        right_on='varx',
                        how='left', validate='m:1')
print(merged)

The issue is I want to do this for all makevar0, makeover1, and makevar2 in the master dataset; but there may be more than 3 such variables. That is the master dataset may have 2,3,4 or any number of makevars (not very large number and certainly below 10). However, all these will be named starting with ‘makevar’ followed by a number. I want to merge all the makevars with the varx in 2nd dataset to get targetvar mapped in the master data in a loop. The 2nd dataframe does not change at all.

Any help is appreciated so that I don’t have to manually check the number of makevars and write the same line of codes many times.

Asked By: econbuffin2019

||

Source

Answer 1

I would suggest using Series.map to substitute the values in each makevar column:

# create a mapping series
s = df2.set_index('varx')['targetvar']

# Substitute values in each makevar like column
for c in df1.filter(like='makevar'):
    df1[c.replace('make', 'target')] = df1[c].map(s)

Result

  othervar makevar0  makevar1 makevar2  targetvar0  targetvar1  targetvar2
0      dfs      cat   giraffe     fish        34.0         NaN         NaN
1      axs      dog       pig     bird         NaN        23.0         NaN
2      fgd      cow      lion     bull        45.0         NaN         NaN
3      rty    mouse  elephant    sheep         NaN        86.0         NaN
4      pwe      fox     tiger    horse         NaN         NaN        90.0

Answered By: Shubham Sharma

Python: how to multiple merge between two data frames when number of variables are unknown

Question:

Answers:

Result