How to find the desired values between two dataframes in python

Question:

I have two dataframes.

One is music.

name Date Edition Song_ID Singer_ID
LA 01.05.2009 1 1 1
Second 13.07.2009 1 2 2
Mexico 13.07.2009 1 3 1
Let’s go 13.09.2009 1 4 3
Hello 18.09.2009 1 5 (4,5)
Don’t give up 12.02.2010 2 6 (5,6)
ZIC ZAC 18.03.2010 2 7 7
Blablabla 14.04.2010 2 8 2
Oh la la 14.05.2011 3 9 4
Food First 14.05.2011 3 10 5
La Vie est.. 17.06.2011 3 11 8
Jajajajajaja 13.07.2011 3 12 9

And another dataframe called singer

Singer nationality Singer_ID
JT Watson USA 1
Rafinha Brazil 2
Juan Casa Spain 3
Kidi USA 4
Dede USA 5
Briana USA 6
Jay Ado UK 7
Dani Australia 8
Mike Rich USA 9

I would like to know, which Edition has the most Singers from USA involved, but the information are in two different dataframes.

What I done so far is that

singer['nationality'].value_counts()['USA']

But this only shows that 5 singers are from USA. I have a column which is in both dataframes the same, called Singer_ID.

Asked By: Jaime

||

Answers:

You need to merge the two dataframes on the key shared
https://pandas.pydata.org/docs/reference/api/pandas.merge.html

merged = singer.merge(music,on="Singer_ID")
merged['nationality'].value_counts()['USA']



editions = merged.groupby("Edition")
# or print(merged.groupby(["Edition", "nationality"])["nationality"].count())
max_value = 0
best_edition = 0
for edition, df in editions:
    nbr_usa = df["nationality"].value_counts()["USA"]
    if nbr_usa > max_value:
        best_edition = edition 
        max_value = nbr_usa
Answered By: Achille G
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.