Python Pandas: Concatenate and update dataframe values from another dataframe
Question:
I have this 2 dataframes.
import pandas as pd
data1 = {
'Product': ['product1', 'product2', 'product3'],
'Price': [200, 300, 400],
'Quantity': [10, 5, 20],
}
df1 = pd.DataFrame(data1, columns= ['Product','Price','Quantity'])
print(df1)
data2 = {
'Product': ['product1','product2','product4'],
'Price': [200, 1000,50],
}
df2 = pd.DataFrame(data2, columns= ['Product','Price'])
df1:
Product Price Quantity
0 product1 200 10
1 product2 300 5
2 product3 400 20
df2:
Product Price
0 product1 200
1 product2 1000
2 product4 50
I search for concatening and updating both to obtain this dataframe:
Product Price Quantity
0 product1 200 10
1 product2 1000 5
2 product3 -1 20
2 product4 50 NaN
This means that:
- New product in df2 (product4) has to be added with the available information (Price)
- Product which is not in df2 should be kept with Price set to -1
- Product in df1 and df2 has to only has his price updated (product2)
- All other Products are kept the same.
Thank you for your help.
Answers:
Here’s a merge based solution:
- Get the
Quantity
value from df1
by merging the two DataFrames on the Product
. The Price
from df1
is dropped so that those prices aren’t added to the final DataFrame. This is an outer merge to ensure the result has products from both df1
and df2
.
- The above step almost gets you to the desired result, except for needing to replace missing prices with
-1
and sorting based on product.
final = df2.merge(df1.drop('Price', axis=1), on='Product', how='outer')
final['Price'].fillna(-1, inplace=True)
final.sort_values('Product', inplace=True)
Result:
Product Price Quantity
0 product1 200.0 10.0
1 product2 1000.0 5.0
3 product3 -1.0 20.0
2 product4 50.0 NaN
I have this 2 dataframes.
import pandas as pd
data1 = {
'Product': ['product1', 'product2', 'product3'],
'Price': [200, 300, 400],
'Quantity': [10, 5, 20],
}
df1 = pd.DataFrame(data1, columns= ['Product','Price','Quantity'])
print(df1)
data2 = {
'Product': ['product1','product2','product4'],
'Price': [200, 1000,50],
}
df2 = pd.DataFrame(data2, columns= ['Product','Price'])
df1:
Product Price Quantity
0 product1 200 10
1 product2 300 5
2 product3 400 20
df2:
Product Price
0 product1 200
1 product2 1000
2 product4 50
I search for concatening and updating both to obtain this dataframe:
Product Price Quantity
0 product1 200 10
1 product2 1000 5
2 product3 -1 20
2 product4 50 NaN
This means that:
- New product in df2 (product4) has to be added with the available information (Price)
- Product which is not in df2 should be kept with Price set to -1
- Product in df1 and df2 has to only has his price updated (product2)
- All other Products are kept the same.
Thank you for your help.
Here’s a merge based solution:
- Get the
Quantity
value fromdf1
by merging the two DataFrames on theProduct
. ThePrice
fromdf1
is dropped so that those prices aren’t added to the final DataFrame. This is an outer merge to ensure the result has products from bothdf1
anddf2
. - The above step almost gets you to the desired result, except for needing to replace missing prices with
-1
and sorting based on product.
final = df2.merge(df1.drop('Price', axis=1), on='Product', how='outer')
final['Price'].fillna(-1, inplace=True)
final.sort_values('Product', inplace=True)
Result:
Product Price Quantity
0 product1 200.0 10.0
1 product2 1000.0 5.0
3 product3 -1.0 20.0
2 product4 50.0 NaN