filter data frame for common vales and rank
Question:
If I have data that looks like this:
name times_used gender
0 Sophia 42261 Girl
1 Jacob 42164 Boy
2 Emma 35951 Girl
3 Ethan 34523 Boy
4 Mason 34195 Boy
5 William 34130 Boy
6 Olivia 34128 Girl
7 Jayden 33962 Boy
8 Michael 33842 Boy
9 Noah 33098 Boy
10 Alexander 32292 Boy
11 Daniel 30907 Boy
12 Aiden 30868 Boy
13 Ava 30765 Girl
Could someone give me a tip on how to use Pandas where I could find names (like top 10) that are used both in a Girl
and Boy
gender? The column times_used
is an int value of how many times that name was chosen for a child.
df = pd.read_csv('../resource/lib/public/babynames.csv')
cols = ['name','times_used','gender']
df.columns = cols
print(df)
Answers:
here is one way to do it
top_count=3
df[df.groupby(['gender'])['times_used'].transform(
lambda x: x.nlargest(top_count)).notna()
].sort_values(['gender','times_used'], ascending=False)
name times_used gender
0 Sophia 42261 Girl
2 Emma 35951 Girl
6 Olivia 34128 Girl
1 Jacob 42164 Boy
3 Ethan 34523 Boy
4 Mason 34195 Boy
df = pd.read_csv('../resource/lib/public/babynames.csv')
cols = ['name','times_used','gender']
df.columns = cols
#identify duplicate rows in 'team' column
duplicateRows = df[df.duplicated(['name'])]
#view duplicate rows
print(duplicateRows)
If I have data that looks like this:
name times_used gender
0 Sophia 42261 Girl
1 Jacob 42164 Boy
2 Emma 35951 Girl
3 Ethan 34523 Boy
4 Mason 34195 Boy
5 William 34130 Boy
6 Olivia 34128 Girl
7 Jayden 33962 Boy
8 Michael 33842 Boy
9 Noah 33098 Boy
10 Alexander 32292 Boy
11 Daniel 30907 Boy
12 Aiden 30868 Boy
13 Ava 30765 Girl
Could someone give me a tip on how to use Pandas where I could find names (like top 10) that are used both in a Girl
and Boy
gender? The column times_used
is an int value of how many times that name was chosen for a child.
df = pd.read_csv('../resource/lib/public/babynames.csv')
cols = ['name','times_used','gender']
df.columns = cols
print(df)
here is one way to do it
top_count=3
df[df.groupby(['gender'])['times_used'].transform(
lambda x: x.nlargest(top_count)).notna()
].sort_values(['gender','times_used'], ascending=False)
name times_used gender
0 Sophia 42261 Girl
2 Emma 35951 Girl
6 Olivia 34128 Girl
1 Jacob 42164 Boy
3 Ethan 34523 Boy
4 Mason 34195 Boy
df = pd.read_csv('../resource/lib/public/babynames.csv')
cols = ['name','times_used','gender']
df.columns = cols
#identify duplicate rows in 'team' column
duplicateRows = df[df.duplicated(['name'])]
#view duplicate rows
print(duplicateRows)