Augment DataFrame index

Question:

I want to write a series ('b') of a dataframe from one dataframe (df2) to another one (df1). Both DataFrames use the same index column, but the range of df2‘s index goes a bit further and it’s missing some of the indices of df1.

This is the current behaviour:

>>> import pandas as pd
>>> pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
   a  b
0  1  4
1  2  5
2  3  6
>>> 
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df1 = df.set_index(['a'])
>>> df1
   b
a   
1  4
2  5
3  6
>>> dg = pd.DataFrame({'a': [3, 4, 5], 'b': [7, 8, 9]})
>>> dg
   a  b
0  3  7
1  4  8
2  5  9
>>> df2 = dg.set_index('a')
>>> df2
   b
a   
3  7
4  8
5  9
>>> df1['b'] = df2['b']
>>> df1
     b
a     
1  NaN
2  NaN
3  7.0

When I call df1['b'] = df2['b'] those values of the indices not in df2 are becoming nan and the indices of df2 that aren’t in df1 are not getting carried over into df1.

Is there any way to change this behaviour so that the resulting DataFrame is the below?

>>> df1
     b
a     
1  1
2  2
3  7
4  8
5  9
Asked By: orange

||

Answers:

One option you can go with is reindex() df2 and then fill missing values with df1:

df2 = df2.reindex(df1.index.union(df2.index))    
df2['b'] = df2['b'].fillna(df1['b'])

df2
#     b
#a  
#1  4.0
#2  5.0
#3  7.0
#4  8.0
#5  9.0
Answered By: Psidom

This is a use case for combine_first. It will prioritize the calling dataframe and fill in any missing values with the second. It will also concatenate rows from the second data frame that don’t have labels in the first.

df2.combine_first(df1)
Answered By: Ted Petrou
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.