Replace values in pandas.DataFrame using MultiIndex
Question:
What I want to do
- I have two
pandas.DataFrame
, df1
and df2
. Both have the same columns.
- All indices in
df2
are also found in df1
, but there are some indices that only df1
has.
- Rows with an index that is owned by both
df1
and df2
, use rows of df2
.
- Rows with an index that is owned only by
df1
, use rows of df1
.
In short, "replaces values of df1
with values of df2
based on MultiIndex
".
import pandas as pd
index_names = ['index1', 'index2']
columns = ['column1', 'column2']
data1 = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
index1 = [['i1', 'i1', 'i1', 'i2', 'i2'], ['A', 'B', 'C', 'B', 'C']]
df1 = pd.DataFrame(data1, index=pd.MultiIndex.from_arrays(index1, names=index_names), columns=columns)
print(df1)
## OUTPUT
# column1 column2
#index1 index2
#i1 A 1 2
# B 2 3
# C 3 4
#i2 B 4 5
# C 5 6
data2 = [[11, 12], [12, 13]]
index2 = [['i2', 'i1'], ['C', 'C']]
df2 = pd.DataFrame(data2, index=pd.MultiIndex.from_arrays(index2, names=index_names), columns=columns)
print(df2)
## OUTPUT
# column1 column2
#index1 index2
#i2 C 11 12
#i1 C 12 13
## DO SOMETHING!
## EXPECTED OUTPUT
# column1 column2
#index1 index2
#i1 A 1 2
# B 2 3
# C 12 13 # REPLACED!
#i2 B 4 5
# C 11 12 # REPLACED!
Environment
Python 3.10.5
Pandas 1.4.3
Answers:
You can try with pd.merge()
on df2
and then fill the missing values from df1
with .fillna()
:
pd.merge(df2, df1, left_index=True, right_index=True, how='right', suffixes=('', '_df1')).fillna(df1).iloc[:,:2]
I assume that there is a typo in your question and it should be:
index2 = [['i2', 'i1'], ['C', 'C']]
You can use direct assignment via .loc
or a call to .update
>>> df3 = df1.copy()
>>> df3.update(df2)
>>> df3
column1 column2
index1 index2
i1 A 1.0 2.0
B 2.0 3.0
C 12.0 13.0
i2 B 4.0 5.0
C 11.0 12.0
What I want to do
- I have two
pandas.DataFrame
,df1
anddf2
. Both have the same columns. - All indices in
df2
are also found indf1
, but there are some indices that onlydf1
has. - Rows with an index that is owned by both
df1
anddf2
, use rows ofdf2
. - Rows with an index that is owned only by
df1
, use rows ofdf1
.
In short, "replaces values of df1
with values of df2
based on MultiIndex
".
import pandas as pd
index_names = ['index1', 'index2']
columns = ['column1', 'column2']
data1 = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
index1 = [['i1', 'i1', 'i1', 'i2', 'i2'], ['A', 'B', 'C', 'B', 'C']]
df1 = pd.DataFrame(data1, index=pd.MultiIndex.from_arrays(index1, names=index_names), columns=columns)
print(df1)
## OUTPUT
# column1 column2
#index1 index2
#i1 A 1 2
# B 2 3
# C 3 4
#i2 B 4 5
# C 5 6
data2 = [[11, 12], [12, 13]]
index2 = [['i2', 'i1'], ['C', 'C']]
df2 = pd.DataFrame(data2, index=pd.MultiIndex.from_arrays(index2, names=index_names), columns=columns)
print(df2)
## OUTPUT
# column1 column2
#index1 index2
#i2 C 11 12
#i1 C 12 13
## DO SOMETHING!
## EXPECTED OUTPUT
# column1 column2
#index1 index2
#i1 A 1 2
# B 2 3
# C 12 13 # REPLACED!
#i2 B 4 5
# C 11 12 # REPLACED!
Environment
Python 3.10.5
Pandas 1.4.3
You can try with pd.merge()
on df2
and then fill the missing values from df1
with .fillna()
:
pd.merge(df2, df1, left_index=True, right_index=True, how='right', suffixes=('', '_df1')).fillna(df1).iloc[:,:2]
I assume that there is a typo in your question and it should be:
index2 = [['i2', 'i1'], ['C', 'C']]
You can use direct assignment via .loc
or a call to .update
>>> df3 = df1.copy()
>>> df3.update(df2)
>>> df3
column1 column2
index1 index2
i1 A 1.0 2.0
B 2.0 3.0
C 12.0 13.0
i2 B 4.0 5.0
C 11.0 12.0