How to find the desired values between two dataframes in python
Question:
I have two dataframes.
One is music.
name
Date
Edition
Song_ID
Singer_ID
LA
01.05.2009
1
1
1
Second
13.07.2009
1
2
2
Mexico
13.07.2009
1
3
1
Let’s go
13.09.2009
1
4
3
Hello
18.09.2009
1
5
(4,5)
Don’t give up
12.02.2010
2
6
(5,6)
ZIC ZAC
18.03.2010
2
7
7
Blablabla
14.04.2010
2
8
2
Oh la la
14.05.2011
3
9
4
Food First
14.05.2011
3
10
5
La Vie est..
17.06.2011
3
11
8
Jajajajajaja
13.07.2011
3
12
9
And another dataframe called singer
Singer
nationality
Singer_ID
JT Watson
USA
1
Rafinha
Brazil
2
Juan Casa
Spain
3
Kidi
USA
4
Dede
USA
5
Briana
USA
6
Jay Ado
UK
7
Dani
Australia
8
Mike Rich
USA
9
I would like to know, which Edition has the most Singers from USA involved, but the information are in two different dataframes.
What I done so far is that
singer['nationality'].value_counts()['USA']
But this only shows that 5 singers are from USA. I have a column which is in both dataframes the same, called Singer_ID.
Answers:
You need to merge the two dataframes on the key shared
https://pandas.pydata.org/docs/reference/api/pandas.merge.html
merged = singer.merge(music,on="Singer_ID")
merged['nationality'].value_counts()['USA']
editions = merged.groupby("Edition")
# or print(merged.groupby(["Edition", "nationality"])["nationality"].count())
max_value = 0
best_edition = 0
for edition, df in editions:
nbr_usa = df["nationality"].value_counts()["USA"]
if nbr_usa > max_value:
best_edition = edition
max_value = nbr_usa
I have two dataframes.
One is music.
name | Date | Edition | Song_ID | Singer_ID |
---|---|---|---|---|
LA | 01.05.2009 | 1 | 1 | 1 |
Second | 13.07.2009 | 1 | 2 | 2 |
Mexico | 13.07.2009 | 1 | 3 | 1 |
Let’s go | 13.09.2009 | 1 | 4 | 3 |
Hello | 18.09.2009 | 1 | 5 | (4,5) |
Don’t give up | 12.02.2010 | 2 | 6 | (5,6) |
ZIC ZAC | 18.03.2010 | 2 | 7 | 7 |
Blablabla | 14.04.2010 | 2 | 8 | 2 |
Oh la la | 14.05.2011 | 3 | 9 | 4 |
Food First | 14.05.2011 | 3 | 10 | 5 |
La Vie est.. | 17.06.2011 | 3 | 11 | 8 |
Jajajajajaja | 13.07.2011 | 3 | 12 | 9 |
And another dataframe called singer
Singer | nationality | Singer_ID |
---|---|---|
JT Watson | USA | 1 |
Rafinha | Brazil | 2 |
Juan Casa | Spain | 3 |
Kidi | USA | 4 |
Dede | USA | 5 |
Briana | USA | 6 |
Jay Ado | UK | 7 |
Dani | Australia | 8 |
Mike Rich | USA | 9 |
I would like to know, which Edition has the most Singers from USA involved, but the information are in two different dataframes.
What I done so far is that
singer['nationality'].value_counts()['USA']
But this only shows that 5 singers are from USA. I have a column which is in both dataframes the same, called Singer_ID.
You need to merge the two dataframes on the key shared
https://pandas.pydata.org/docs/reference/api/pandas.merge.html
merged = singer.merge(music,on="Singer_ID")
merged['nationality'].value_counts()['USA']
editions = merged.groupby("Edition")
# or print(merged.groupby(["Edition", "nationality"])["nationality"].count())
max_value = 0
best_edition = 0
for edition, df in editions:
nbr_usa = df["nationality"].value_counts()["USA"]
if nbr_usa > max_value:
best_edition = edition
max_value = nbr_usa