Augment DataFrame index

Question

I want to write a series ('b') of a dataframe from one dataframe (df2) to another one (df1). Both DataFrames use the same index column, but the range of df2‘s index goes a bit further and it’s missing some of the indices of df1.

This is the current behaviour:

>>> import pandas as pd
>>> pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
   a  b
0  1  4
1  2  5
2  3  6
>>> 
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df1 = df.set_index(['a'])
>>> df1
   b
a   
1  4
2  5
3  6
>>> dg = pd.DataFrame({'a': [3, 4, 5], 'b': [7, 8, 9]})
>>> dg
   a  b
0  3  7
1  4  8
2  5  9
>>> df2 = dg.set_index('a')
>>> df2
   b
a   
3  7
4  8
5  9
>>> df1['b'] = df2['b']
>>> df1
     b
a     
1  NaN
2  NaN
3  7.0

When I call df1['b'] = df2['b'] those values of the indices not in df2 are becoming nan and the indices of df2 that aren’t in df1 are not getting carried over into df1.

Is there any way to change this behaviour so that the resulting DataFrame is the below?

Asked By: orange

||

Source

Answer 1

One option you can go with is reindex() df2 and then fill missing values with df1:

df2 = df2.reindex(df1.index.union(df2.index))    
df2['b'] = df2['b'].fillna(df1['b'])

df2
#     b
#a  
#1  4.0
#2  5.0
#3  7.0
#4  8.0
#5  9.0

Answered By: Psidom

Answer 2

This is a use case for combine_first. It will prioritize the calling dataframe and fill in any missing values with the second. It will also concatenate rows from the second data frame that don’t have labels in the first.

df2.combine_first(df1)

Answered By: Ted Petrou

Augment DataFrame index

Question:

Answers: