Replace column value based on columns matches with another dataframe

Question:

I have this two dataframes

DataFrame A

    column1 column2 column3 column4 
1   a       2       True    23
2   b       2       False   cdsg
3   c       3       False   asdf
4   a       2       False   sdac
5   b       1       False   asdcd

Dataframe B is a single-row dataframe like

    column1 column2 column3 column4
1   c       3       False    asdmn

What I want to do it to match the first 3 columns and, if found, replace the value of Column4 so that the result is this.

    column1 column2 column3 column4 
1   a       2       True    23
2   b       2       False   cdsg
3   c       3       False   asdmn
4   a       2       False   sdac
5   b       1       False   asdcd

Otherwise, if there is not match, to attach it at the end. I could do that last part with a simple pd.append but I first need to make the first part work.

Asked By: Agustin Barrachina

||

Answers:

I suggest merging and then fillna:

df = df_a.merge(df_b, on = ['column1', 'column2', 'column3'], suffixes = ('_a', '_b'))
# check if merge succesfuly happened:
if df['column4'].isna().sum() < len(df['column4']):
    df['column4'] = df['column4_b'].fillna(df['column4_a'])
    df = df[['column1', 'column2', 'column3', 'column4']]
else:
    df = pd.concat([df_a, df_b])

df
Answered By: Artyom Akselrod

You can use this code for replace column 4 value with matching data from second dataframe.

Code :

import pandas as pd

df_a = pd.DataFrame({
    'column1': ['a', 'b', 'c', 'a', 'b'],
    'column2': [2, 2, 3, 2, 1],
    'column3': [True, False, False, False, False],
    'column4': ['23', 'cdsg', 'asdf', 'sdac', 'asdcd']
})

df_b = pd.DataFrame({
    'column1': ['c'],
    'column2': [3],
    'column3': [False],
    'column4': ['asdmn']
})

final_df = pd.merge(df_a, df_b, on=['column1', 'column2', 'column3'], how='left').copy()

final_df['column4'] = final_df['column4_y'].fillna(final_df['column4_x'])

final_df.drop(['column4_x', 'column4_y'], axis=1, inplace=True)

print(final_df)

Output:

      column1  column2  column3 column4
0        a        2     True     23
1        b        2     False    cdsg
2        c        3     False    asdmn
3        a        2     False    sdac
4        b        1     False    asdcd
Answered By: NIKUNJ PATEL
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.