Replace values in pandas.DataFrame using MultiIndex

Question:

What I want to do

  • I have two pandas.DataFrame, df1 and df2. Both have the same columns.
  • All indices in df2 are also found in df1, but there are some indices that only df1 has.
  • Rows with an index that is owned by both df1 and df2, use rows of df2.
  • Rows with an index that is owned only by df1, use rows of df1.

In short, "replaces values of df1 with values of df2 based on MultiIndex".

import pandas as pd

index_names = ['index1', 'index2']
columns = ['column1', 'column2']

data1 = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] 
index1 = [['i1', 'i1', 'i1', 'i2', 'i2'], ['A', 'B', 'C', 'B', 'C']]
df1 = pd.DataFrame(data1, index=pd.MultiIndex.from_arrays(index1, names=index_names), columns=columns)
print(df1)
## OUTPUT
#               column1  column2
#index1 index2                  
#i1     A             1        2
#       B             2        3
#       C             3        4
#i2     B             4        5
#       C             5        6

data2 = [[11, 12], [12, 13]]
index2 = [['i2', 'i1'], ['C', 'C']]
df2 = pd.DataFrame(data2, index=pd.MultiIndex.from_arrays(index2, names=index_names), columns=columns)
print(df2)
## OUTPUT
#               column1  column2
#index1 index2                  
#i2     C            11       12
#i1     C            12       13

## DO SOMETHING!

## EXPECTED OUTPUT
#               column1  column2
#index1 index2                  
#i1     A             1        2
#       B             2        3
#       C            12       13 # REPLACED!
#i2     B             4        5
#       C            11       12 # REPLACED!

Environment

Python 3.10.5
Pandas 1.4.3

Asked By: dmjy

||

Answers:

You can try with pd.merge()on df2 and then fill the missing values from df1 with .fillna():

pd.merge(df2, df1, left_index=True, right_index=True, how='right', suffixes=('', '_df1')).fillna(df1).iloc[:,:2]

I assume that there is a typo in your question and it should be:

index2 = [['i2', 'i1'], ['C', 'C']]
Answered By: max_jump

You can use direct assignment via .loc or a call to .update

>>> df3 = df1.copy()
>>> df3.update(df2)
>>> df3
               column1  column2
index1 index2                  
i1     A           1.0      2.0
       B           2.0      3.0
       C          12.0     13.0
i2     B           4.0      5.0
       C          11.0     12.0
Answered By: Cameron Riddell