Merge with multiple columns and refill NAN values in Python

Question:

I have df1 that looks like this:

STATE      YEAR       EVENT_TYPE       DAMAGE
   ALABAMA    1962       Tornado          27
   ALABAMA    1962       Flood            7
   ALABAMA    1963       Thunderstorm     12
...

and df2 that looks like this:

STATE      YEAR       TORNADO      THUNDERSTORM      FLOOD
   ALABAMA     1962       NaN          NaN           NaN
   ALABAMA     1963       NaN          NaN           NaN
...

And I want to merge these two dataframes together, so the final output looks like this:

STATE      YEAR       TORNADO      THUNDERSTORM      FLOOD   
  ALABAMA    1962       27           NaN              7
...

Having hard time figuring out how to do this.

Asked By: David

||

Answers:

merge the pivoted df1:

cols = ['STATE', 'YEAR']

out = df2[cols].merge(df1.pivot(index=cols, columns='EVENT_TYPE', values='DAMAGE'),
                      left_on=cols, right_index=True)
Answered By: mozway

You can use df1.update(df2) to avoid NaN positions (if possible):

import pandas as pd
import numpy  as np

NaN = np.nan

df1 = pd.DataFrame([[1,2  ],[3,NaN],[NaN,NaN]], columns = ('X','Y'))
df2 = pd.DataFrame([[1,NaN],[3,4  ],[NaN,6  ]], columns = ('X','Y'))

df1.update(df2)

print(df1)
# output
#    X    Y
# 0  1.0  2.0
# 1  3.0  4.0
# 2  NaN  6.0
Answered By: C-3PO

Concat two dataframes.

pd.concat([df1, df2], axis=0)

Concat two dataframes and replace nan with 0, or whatever value you desire.

pd.concat([df1, df2], axis=0).df.fillna(0)
Answered By: Captain Caveman
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.