filter data frame for common vales and rank

Question:

If I have data that looks like this:

              name  times_used gender
0           Sophia       42261   Girl
1            Jacob       42164    Boy
2             Emma       35951   Girl
3            Ethan       34523    Boy
4            Mason       34195    Boy
5          William       34130    Boy
6           Olivia       34128   Girl
7           Jayden       33962    Boy
8          Michael       33842    Boy
9             Noah       33098    Boy
10       Alexander       32292    Boy
11          Daniel       30907    Boy
12           Aiden       30868    Boy
13             Ava       30765   Girl

Could someone give me a tip on how to use Pandas where I could find names (like top 10) that are used both in a Girl and Boy gender? The column times_used is an int value of how many times that name was chosen for a child.

df = pd.read_csv('../resource/lib/public/babynames.csv')
cols = ['name','times_used','gender']
df.columns = cols

print(df)
Asked By: bbartling

||

Answers:

here is one way to do it

top_count=3
df[df.groupby(['gender'])['times_used'].transform(
    lambda x: x.nlargest(top_count)).notna()
  ].sort_values(['gender','times_used'], ascending=False)


    name    times_used  gender
0   Sophia       42261  Girl
2   Emma         35951  Girl
6   Olivia       34128  Girl
1   Jacob        42164  Boy
3   Ethan        34523  Boy
4   Mason        34195  Boy
Answered By: Naveed
df = pd.read_csv('../resource/lib/public/babynames.csv')
cols = ['name','times_used','gender']
df.columns = cols


#identify duplicate rows in 'team' column
duplicateRows = df[df.duplicated(['name'])]

#view duplicate rows
print(duplicateRows)
Answered By: bbartling
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.