Pandas map two dataframe based on column name mentioned on the other df and partial match to derive new column


I have two dataframes df1 & df2 as below.

import pandas as pd

data1 = {'Column1': [1, 2, 3],
        'Column2': ['Account', 'Biscut', 'Super'],
        'Column3': ['Funny', 'Super', 'Nice']}

df1 = pd.DataFrame(data1)

data2 = {'ColumnName':['Column2','Column3','Column1'],
df2 = pd.DataFrame(data2)

I want to add new column TarName to the df1 by partially matching the ifExists value from df2 against the ColumnName that Mentioned in the df2 with df1.

My Expected Ouput is:

Column1    column2     column3  TarName
1          Account     Funny    Account_Name
2          Biscut      Super    Super_name  
3          Super       Nice     Val_3

I have tried below code. This code able to partial map but only to one column. With this approach I Need to create dictionaries as many column mapping I have and need to apply as many.

Is there more dynamic approach ?

df2_Column2_dict = df2[df2['ColumnName']=='Column2'].set_index(['ifExist'])['TarName'].to_dict()
pat = r'({})'.format('|'.join(df2_Column2_dict.keys()))
extracted = df1['Column2'].str.extract(pat, expand=False).dropna()
df1['TarName'] =  extracted.apply(lambda x: df2_Column2_dict[x]).reindex(df2.index)

Asked By: Roshan Nuvvula



I hope help you with the code. For each concept/line is the explanation with a comment.

import pandas as pd

# Create DataFrames
data1 = {'Column1': [1, 2, 3],
        'Column2': ['Account', 'Biscut', 'Super'],
        'Column3': ['Funny', 'Super', 'Nice']}
df1 = pd.DataFrame(data1)

data2 = {'ColumnName':['Column2','Column3'],
df2 = pd.DataFrame(data2)

# Initialize new column in df1
df1['TarName'] = ''

# Iterate over each row in df2
for _, row in df2.iterrows():
    # Column in df1 to search
    column_name = row['ColumnName']  
    # Partial value to search for
    if_exist = row['ifExist']        
    # Value to assign if there is a match
    tar_name = row['TarName']        

    # Find partial matches and assign TarName
    mask = df1[column_name].str.contains(if_exist, case=False, na=False)
    df1.loc[mask, 'TarName'] = tar_name

# Show the result

Edit: Tell if you need to solve it without loops

Answered By: spalominor
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.