Setting value of columns in one dataframe to another dataframe column based on condition

Question:

I have a dataframe df1 :-

id Group Family Bonus
1 tri23_1 Laavin A
2 hsgç_T2 Grendy B
3 bbbj-1Y_jn Fantol C
4 hsgç_T2 Gondow D

and have a dataframe df2 :-

id Group Family Bonus
1 tri Brendy B
2 hsgç_T2 Fantol A
5 bbbj-1Y_jn Fantol E
6 hsgç_T3 Gondow D

For id in df1 that are present in df2,I want to set Group ,Family,Bonus columns alone of df2 in df1.
Expected df1:-

id Group Family Bonus
1 tri Brendy B
2 hsgç_T2 Fantol A
3 bbbj-1Y_jn Fantol C
4 hsgç_T2 Gondow D
Asked By: Scope

||

Answers:

shared_columns = ['Group', 'Family', 'Bonus']

One-liner:

df1.update(df1.set_index('id')[shared_columns]
              .align(df2.set_index('id'), join='left')[1]
              .set_axis(df1.index, axis=0))

My original solution:

replacements = (pd.merge(df1, df2, on='id', how='left')
                  .iloc[:, -len(shared_columns):])
replacements.columns = shared_columns

df1.update(replacements)

Example:

df1 = pd.DataFrame([[1,  1, 0.1, 1], 
                    [2,  2, 0.2, 2], 
                    [3,  3, 0.3, 3]], 
                   columns=['id', 'Group', 'Family', 'Bonus'])
df2 = pd.DataFrame([[3,  30, 300, 3], 
                    [1,  10, 100, 1], 
                    [4,  40, 400, 4]], 
                   columns=['id', 'Group', 'Family', 'Bonus'])

df1:

   id  Group  Family  Bonus
0   1      1     0.1      1
1   2      2     0.2      2
2   3      3     0.3      3

df2:

   id  Group  Family  Bonus
0   3     30     300      3
1   1     10     100      1
2   4     40     400      4

pd.merge(df1, df2, on='id', how='left'
performs a left-merge, on the column id:

   id  Group_x  Family_x  Bonus_x  Group_y  Family_y  Bonus_y
0   1        1       0.1        1     10.0     100.0      1.0
1   2        2       0.2        2      NaN       NaN      NaN
2   3        3       0.3        3     30.0     300.0      3.0

replacements (after selecting the last columns, and renaming them):

   Group  Family  Bonus
0   10.0   100.0    1.0
1    NaN     NaN    NaN
2   30.0   300.0    3.0

Resulting df1:

   id  Group  Family  Bonus
0   1   10.0   100.0    1.0
1   2    2.0     0.2    2.0
2   3   30.0   300.0    3.0

Another way to select the right columns after merging:

By default, .merge uses suffixes=('_x', '_y'). So to select the columns of the right data frame, we can append '_y' to each its column name:

merged = pd.merge(df1, df2, on='id', how='left')
right_colnames = [colname + '_y' for colname in shared_columns]
replacements = merged[right_colnames]
   Group_y  Family_y  Bonus_y
0     10.0     100.0      1.0
1      NaN       NaN      NaN
2     30.0     300.0      3.0

Another solution:

temp_df = df1.reset_index().set_index('id')
temp_df.update(df2.set_index('id')[shared_columns])
df1 = temp_df.reset_index().set_index('index').rename_axis(None, axis=0)
Answered By: Vladimir Fokow

if setting index on DF is acceptable, then this is one way to accomplish it. Using update

df.set_index('id', inplace=True)
df.update(df2.set_index('id'))
df
    Group       Family  Bonus
id          
1   tri         Brendy  B
2   hsgç_T2     Fantol  A
3   bbbj-1Y_jn  Fantol  C
4   hsgç_T2     Gondow  D
Answered By: Naveed
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.