How to merge two dataframes and retain values in multiple rows?
Question:
is the merge function the appropriate way to join the Dataframes 1 and 2 to get the Desired Dataframe?
Dataframe 1:
animal
category
Add-Date
wolf
land
24/09/22
eagle
sky
24/09/22
robin
sky
24/09/22
bear
land
24/09/22
cod
water
24/09/22
salmon
water
24/09/22
Dataframe 2:
category
Tier
land
1
sky
2
Desired Dataframe:
animal
category
Add-Date
Tier
wolf
land
24/09/22
1
eagle
sky
24/09/22
2
robin
sky
24/09/22
2
bear
land
24/09/22
1
The Desired Dataframe is Dataframe 1 with irrelevant categories removed and the appropriate Tier brought across with the category.
I was trying merge, join, etc. but unsure of the best approach or if I am making an error.
Any help much appreciated on the code or method to use.
Answers:
df_merged = pd.merge(df1, df2, how = 'left', on = 'category')
out:
animal category Add-Date Tier
0 wolf land 24/09/22 1.0
1 eagle sky 24/09/22 2.0
2 robin sky 24/09/22 2.0
3 bear land 24/09/22 1.0
4 cod water 24/09/22 NaN
5 salmon water 24/09/22 NaN
you can drop NaN values if needed by
df_merged = pd.merge(df1, df2, how = 'left', on = 'category').dropna()
out:
animal category Add-Date Tier
0 wolf land 24/09/22 1.0
1 eagle sky 24/09/22 2.0
2 robin sky 24/09/22 2.0
3 bear land 24/09/22 1.0
You can perfom an inner join by using pandas.merge
to grab the Tier
column from df2
.
out = df1.merge(df2, on='category')
display(out)
is the merge function the appropriate way to join the Dataframes 1 and 2 to get the Desired Dataframe?
Dataframe 1:
animal | category | Add-Date |
---|---|---|
wolf | land | 24/09/22 |
eagle | sky | 24/09/22 |
robin | sky | 24/09/22 |
bear | land | 24/09/22 |
cod | water | 24/09/22 |
salmon | water | 24/09/22 |
Dataframe 2:
category | Tier |
---|---|
land | 1 |
sky | 2 |
Desired Dataframe:
animal | category | Add-Date | Tier |
---|---|---|---|
wolf | land | 24/09/22 | 1 |
eagle | sky | 24/09/22 | 2 |
robin | sky | 24/09/22 | 2 |
bear | land | 24/09/22 | 1 |
The Desired Dataframe is Dataframe 1 with irrelevant categories removed and the appropriate Tier brought across with the category.
I was trying merge, join, etc. but unsure of the best approach or if I am making an error.
Any help much appreciated on the code or method to use.
df_merged = pd.merge(df1, df2, how = 'left', on = 'category')
out:
animal category Add-Date Tier
0 wolf land 24/09/22 1.0
1 eagle sky 24/09/22 2.0
2 robin sky 24/09/22 2.0
3 bear land 24/09/22 1.0
4 cod water 24/09/22 NaN
5 salmon water 24/09/22 NaN
you can drop NaN values if needed by
df_merged = pd.merge(df1, df2, how = 'left', on = 'category').dropna()
out:
animal category Add-Date Tier
0 wolf land 24/09/22 1.0
1 eagle sky 24/09/22 2.0
2 robin sky 24/09/22 2.0
3 bear land 24/09/22 1.0
You can perfom an inner join by using pandas.merge
to grab the Tier
column from df2
.
out = df1.merge(df2, on='category')
display(out)