join or merge with overwrite in pandas

Question:

I want to perform a join/merge/append operation on a dataframe with datetime index.

Let’s say I have df1 and I want to add df2 to it. df2 can have fewer or more columns, and overlapping indexes. For all rows where the indexes match, if df2 has the same column as df1, I want the values of df1 be overwritten with those from df2.

How can I obtain the desired result?

Asked By: saroele

||

Answers:

How about: df2.combine_first(df1)?

In [33]: df2
Out[33]: 
                   A         B         C         D
2000-01-03  0.638998  1.277361  0.193649  0.345063
2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726
2000-01-05  0.435507 -0.025162 -1.112890  0.324111
2000-01-06 -0.210756 -1.027164  0.036664  0.884715
2000-01-07 -0.821631 -0.700394 -0.706505  1.193341
2000-01-10  1.015447 -0.909930  0.027548  0.258471
2000-01-11 -0.497239 -0.979071 -0.461560  0.447598

In [34]: df1
Out[34]: 
                   A         B         C
2000-01-03  2.288863  0.188175 -0.040928
2000-01-04  0.159107 -0.666861 -0.551628
2000-01-05 -0.356838 -0.231036 -1.211446
2000-01-06 -0.866475  1.113018 -0.001483
2000-01-07  0.303269  0.021034  0.471715
2000-01-10  1.149815  0.686696 -1.230991
2000-01-11 -1.296118 -0.172950 -0.603887
2000-01-12 -1.034574 -0.523238  0.626968
2000-01-13 -0.193280  1.857499 -0.046383
2000-01-14 -1.043492 -0.820525  0.868685

In [35]: df2.comb
df2.combine        df2.combineAdd     df2.combine_first  df2.combineMult    

In [35]: df2.combine_first(df1)
Out[35]: 
                   A         B         C         D
2000-01-03  0.638998  1.277361  0.193649  0.345063
2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726
2000-01-05  0.435507 -0.025162 -1.112890  0.324111
2000-01-06 -0.210756 -1.027164  0.036664  0.884715
2000-01-07 -0.821631 -0.700394 -0.706505  1.193341
2000-01-10  1.015447 -0.909930  0.027548  0.258471
2000-01-11 -0.497239 -0.979071 -0.461560  0.447598
2000-01-12 -1.034574 -0.523238  0.626968       NaN
2000-01-13 -0.193280  1.857499 -0.046383       NaN
2000-01-14 -1.043492 -0.820525  0.868685       NaN

Note that it takes the values from df1 for indices that do not overlap with df2. If this doesn’t do exactly what you want I would be willing to improve this function / add options to it.

Answered By: Wes McKinney

For a merge like this, the update method of a DataFrame is useful.

Taking the examples from the documentation:

import pandas as pd
import numpy as np

df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, 2.1, np.nan],
                   [np.nan, 7., np.nan]])
df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],
                   index=[1, 2])

Data before the update:

>>> df1
     0    1    2
0  NaN  3.0  5.0
1 -4.6  2.1  NaN
2  NaN  7.0  NaN
>>>
>>> df2
      0    1    2
1 -42.6  NaN -8.2
2  -5.0  1.6  4.0

Let’s update df1 with data from df2:

df1.update(df2)

Data after the update:

>>> df1
      0    1    2
0   NaN  3.0  5.0
1 -42.6  2.1 -8.2
2  -5.0  1.6  4.0

Remarks:

  • It’s important to notice that this is an operation “in place”, modifying the DataFrame that calls update.
  • Also note that non NaN values in df1 are not overwritten with NaN values in df2
Answered By: Nicolás Ozimica
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.