Match, update and replace values from one dataset to another in Pandas

Question:

I have two datasets, one has old data and one has updated data. I’d like to create a new dataset by updating values based on if the area, date and column values match.

Data

df1

area    date        aa  bb  cc  
japan   10/1/2027   1   0   0   
us      1/1/2022    5   5   5   
fiji    11/2/2026   1   1   1   
                
                

df2

area        date        aa  bb  cc  stat
japan       10/1/2027   0   5   5   yes
fiji        11/2/2026   0   0   10  no
                

I have two datasets. I wish to replace the values in [aa], [bb], and [cc] columns of df2 with the updated values from df1 if we have the same date and area values. The aa, bb, and cc column are replaced with the updated values.

Desired

area        date        aa  bb  cc  stat
japan       10/1/2027   1   0   0   yes
fiji        11/2/2026   1   1   1   no

Doing

df['date'] = df.date.apply(lambda x: np.nan if x == ' ' else x)

I am not exactly sure how to set this up, however, I have an idea. Any suggestion is appreciated

Asked By: Lynn

||

Answers:

Using .merge and making sure date columns in both dfs are set to datetime.

df1["date"] = pd.to_datetime(df1["date"])
df2["date"] = pd.to_datetime(df2["date"])

df3 = pd.merge(left=df1, right=df2, on=["area", "date"], how="right").filter(regex=r".*(?<!_y)$")
df3.columns = df3.columns.str.split("_").str[0]
print(df3)

    area       date  aa  bb  cc stat
0  japan 2027-10-01   1   0   0  yes
1   fiji 2026-11-02   1   1   1   no
Answered By: Jason Baker

I think this can possibly be simplified to:

output = df1[df1['area'].isin(df2['area']) & df1['date'].isin(df2['date'])]

OUTPUT:

area        date        aa  bb  cc  stat
japan       10/1/2027   1   0   0   yes
fiji        11/2/2026   1   1   1   no

Even when df1 looks like this:

DF1:

    area       date  aa  bb  cc
0  japan  10/1/2027   1   0   0
1     us   1/1/2022   5   5   5
2   fiji  11/2/2026   1   1   1
3   fiji  12/5/2025   9   9   9
Answered By: ScottC

You can merge and combine_first:

cols = ['area', 'date']
out = (df2[cols].merge(df1, on=cols, how='left')
       .combine_first(df2)[df2.columns]
      )

output:

    area       date  aa  bb  cc stat
0  japan  10/1/2027   1   0   0  yes
1   fiji  11/2/2026   1   1   1   no
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.