Pandas update and add rows one dataframe with key column in another dataframe
Question:
I have 2 data frames with identical columns. Column ‘key’ will have unique values.
Data frame 1:-
A B key C
0 1 k1 2
1 2 k2 3
2 3 k3 5
Data frame 2:-
A B key C
4 5 k1 2
1 2 k2 3
2 3 k4 5
I would like to update rows in Dataframe-1 with values in Dataframe -2 if key in Dataframe -2 matches with Dataframe -1.
Also if key is new then add entire row from Dataframe-2 to Dataframe-1.
Final Output Dataframe is like this with same columns.
A B key C
4 5 k1 2 --> update
1 2 k2 3 --> no changes
2 3 k3 5 --> no changes
2 3 k4 5 --> new row
I have tried with below code. I need only 4 columns ‘A’, ‘B’,’Key’,’C’ without any suffixes after merge.
df3 = df1.merge(df2,on='key',how='outer')
>>> df3
A_x B_x key C_x A_y B_y C_y
0 0.0 1.0 k1 2.0 4.0 5.0 2.0
1 1.0 2.0 k2 3.0 1.0 2.0 3.0
2 2.0 3.0 k3 5.0 NaN NaN NaN
3 NaN NaN k4 NaN 2.0 3.0 5.0
Answers:
Try to append and remove duplicates:
df3 = pd.drop_duplicates(df1.append(df2))
try this:
df1 = {'key': ['k1', 'k2', 'k3'], 'A':[0,1,2], 'B': [1,2,3], 'C':[2,3,5]}
df1 = pd.DataFrame(data=df1)
print (df1)
df2 = {'key': ['k1', 'k2', 'k4'], 'A':[4,1,2], 'B': [5,2,3], 'C':[2,3,5]}
df2 = pd.DataFrame(data=df2)
print (df2)
df3 = df1.append(df2)
df3.drop_duplicates(subset=['key'], keep='last', inplace=True)
df3 = df3.sort_values(by=['key'], ascending=True)
print (df3)
It seems like you’re looking for combine_first
.
a = df2.set_index('key')
b = df1.set_index('key')
(a.combine_first(b)
.reset_index()
.reindex(columns=df1.columns))
A B key C
0 4.0 5.0 k1 2.0
1 1.0 2.0 k2 3.0
2 2.0 3.0 k3 5.0
3 2.0 3.0 k4 5.0
assumes both dataframes have the same index columns
df3 = df1.combine_first(df2)
df3.update(df2)
First, you need to indicate index columns:
df1.set_index('key', inplace=True)
df2.set_index('key', inplace=True)
Then, combine the dataframes to get all the index keys in place (this will not update the df1 values! See: combine_first manual):
df1 = df1.combine_first(df2)
Last step is updating the values in df1 with df2 and resetting the index
df1.update(df2)
df1.reset_index(inplace=True)
After setting the same column as index on each dataframe:
def df_upsert(df1, df2):
df = df1.combine_first(df2)
df.update(df2)
return df
I have 2 data frames with identical columns. Column ‘key’ will have unique values.
Data frame 1:-
A B key C
0 1 k1 2
1 2 k2 3
2 3 k3 5
Data frame 2:-
A B key C
4 5 k1 2
1 2 k2 3
2 3 k4 5
I would like to update rows in Dataframe-1 with values in Dataframe -2 if key in Dataframe -2 matches with Dataframe -1.
Also if key is new then add entire row from Dataframe-2 to Dataframe-1.
Final Output Dataframe is like this with same columns.
A B key C
4 5 k1 2 --> update
1 2 k2 3 --> no changes
2 3 k3 5 --> no changes
2 3 k4 5 --> new row
I have tried with below code. I need only 4 columns ‘A’, ‘B’,’Key’,’C’ without any suffixes after merge.
df3 = df1.merge(df2,on='key',how='outer')
>>> df3
A_x B_x key C_x A_y B_y C_y
0 0.0 1.0 k1 2.0 4.0 5.0 2.0
1 1.0 2.0 k2 3.0 1.0 2.0 3.0
2 2.0 3.0 k3 5.0 NaN NaN NaN
3 NaN NaN k4 NaN 2.0 3.0 5.0
Try to append and remove duplicates:
df3 = pd.drop_duplicates(df1.append(df2))
try this:
df1 = {'key': ['k1', 'k2', 'k3'], 'A':[0,1,2], 'B': [1,2,3], 'C':[2,3,5]}
df1 = pd.DataFrame(data=df1)
print (df1)
df2 = {'key': ['k1', 'k2', 'k4'], 'A':[4,1,2], 'B': [5,2,3], 'C':[2,3,5]}
df2 = pd.DataFrame(data=df2)
print (df2)
df3 = df1.append(df2)
df3.drop_duplicates(subset=['key'], keep='last', inplace=True)
df3 = df3.sort_values(by=['key'], ascending=True)
print (df3)
It seems like you’re looking for combine_first
.
a = df2.set_index('key')
b = df1.set_index('key')
(a.combine_first(b)
.reset_index()
.reindex(columns=df1.columns))
A B key C
0 4.0 5.0 k1 2.0
1 1.0 2.0 k2 3.0
2 2.0 3.0 k3 5.0
3 2.0 3.0 k4 5.0
assumes both dataframes have the same index columns
df3 = df1.combine_first(df2)
df3.update(df2)
First, you need to indicate index columns:
df1.set_index('key', inplace=True)
df2.set_index('key', inplace=True)
Then, combine the dataframes to get all the index keys in place (this will not update the df1 values! See: combine_first manual):
df1 = df1.combine_first(df2)
Last step is updating the values in df1 with df2 and resetting the index
df1.update(df2)
df1.reset_index(inplace=True)
After setting the same column as index on each dataframe:
def df_upsert(df1, df2):
df = df1.combine_first(df2)
df.update(df2)
return df