Pandas sum two columns, skipping NaN

Question:

If I add two columns to create a third, any columns containing NaN (representing missing data in my world) cause the resulting output column to be NaN as well. Is there a way to skip NaNs without explicitly setting the values to 0 (which would lose the notion that those values are “missing”)?

In [42]: frame = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 4]})

In [44]: frame['c'] = frame['a'] + frame['b']

In [45]: frame
Out[45]: 
    a   b   c
0   1   3   4
1   2 NaN NaN
2 NaN   4 NaN

In the above, I would like column c to be [4, 2, 4].

Thanks…

Asked By: smontanaro

||

Answers:

with fillna()

frame['c'] = frame.fillna(0)['a'] + frame.fillna(0)['b']

or as suggested :

frame['c'] = frame.a.fillna(0) + frame.b.fillna(0)

giving :

    a   b  c
0   1   3  4
1   2 NaN  2
2 NaN   4  4
Answered By: jrjc

Another approach:

>>> frame["c"] = frame[["a", "b"]].sum(axis=1)
>>> frame
    a   b  c
0   1   3  4
1   2 NaN  2
2 NaN   4  4
Answered By: DSM

As an expansion to the answer above, doing frame[["a", "b"]].sum(axis=1) will fill sum of all NaNs as 0

>>> frame["c"] = frame[["a", "b"]].sum(axis=1)
>>> frame
    a   b  c
0   1   3  4
1   2 NaN  2
2 NaN   4  4
3 NaN NaN  0

If you want the sum of all NaNs to be NaN, you can add the min_count flag as referenced in the docs

>>> frame["c"] = frame[["a", "b"]].sum(axis=1, min_count=1)
>>> frame
    a   b  c
0   1   3  4
1   2 NaN  2
2 NaN   4  4
3 NaN NaN  NaN
Answered By: Ash
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.