Adding two pandas dataframes

Question:

I have two dataframes, both indexed by timeseries. I need to add the elements together to form a new dataframe, but only if the index and column are the same. If the item does not exist in one of the dataframes then it should be treated as a zero.

I’ve tried using .add but this sums regardless of index and column. Also tried a simple combined_data = dataframe1 + dataframe2 but this give a NaN if both dataframes don’t have the element.

Any suggestions?

Asked By: cs0679

||

Answers:

If I understand you correctly, you want something like:

(x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))

This will give the sum of the two dataframes. If a value is in one dataframe and not the other, the result at that position will be that existing value (look at B0 in X and B0 in Y and look at final output). If a value is missing in both dataframes, the result at that position will be zero (look at B1 in X and B1 in Y and look at final output).

>>> x
   A   B   C
0  1   2 NaN
1  3 NaN   4
>>> y
    A   B   C
0   8 NaN  88
1   2 NaN   5
2  10  11  12
>>> (x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))
    A   B   C
0   9   2  88
1   5   0   9
2  10  11  12
Answered By: BrenBarn

How about x.add(y, fill_value=0)?

import pandas as pd

df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
Out: 
   a  b
0  1  2
1  3  4
2  5  6

df2 = pd.DataFrame([(100,200),(300,400),(500,600)], columns=['a','b'])
Out: 
     a    b
0  100  200
1  300  400
2  500  600

df_add = df1.add(df2, fill_value=0)
Out: 
     a    b
0  101  202
1  303  404
2  505  606
Answered By: Wes McKinney

For making more general the answer… first I will take the common index for synchronizing both dataframes, then I will join each of them to my pattern (dates) and I will sum the columns of the same name and finally join both dataframes (deleting added columns in one of them),

you can see an example (with google’s stock prices taken from google) here:

import numpy as np
import pandas as pd
import datetime as dt

prices = pd.DataFrame([[553.0, 555.5, 549.3, 554.11, 0],
                       [556.8, 556.8, 544.05, 545.92, 545.92],
                       [545.5, 546.89, 540.97, 542.04, 542.04]],
                       index=[dt.datetime(2014,11,04), dt.datetime(2014,11,05), dt.datetime(2014,11,06)],
                       columns=['Open', 'High', 'Low', 'Close', 'Adj Close'])

corrections = pd.DataFrame([[0, 555.22], [1238900, 0]],
                    index=[dt.datetime(2014,11,3), dt.datetime(2014,11,4)],
                    columns=['Volume', 'Adj Close'])

dates = pd.DataFrame(prices.index, columns = ['Dates']).append(pd.DataFrame(corrections.index, columns = ['Dates'])).drop_duplicates('Dates').set_index('Dates').sort(axis=0)
df_corrections = dates.join(corrections).fillna(0)
df_prices = dates.join(prices).fillna(0)

for col in prices.columns:
    if col in corrections.columns:
        df_prices[col]+=df_corrections[col]
        del df_corrections[col]

df_prices = df_prices.join(df_corrections)
Answered By: Xavi

Both the above answers – fillna(0) and a direct addition would give you Nan values if either of them have different structures.

Its Better to use fill_value

df.add(other_df, fill_value=0)

Answered By: Prafulla Pallal
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.