Grouping by Aggregate Functions in #Pandas

Question:

I’m trying to find out which occupation has the max mean salary.
I’ve tried
df.groupby('Occupation').agg({'Salary':'mean'})

I think I’ve figured out how to get the max mean salary
but I can’t figure out how to get the specific occupation title.
Any tips ? Thank you!!

Asked By: kristinek

||

Answers:

When you perform a groupby, the features you use in your groupby become the index. Since the result of your groupby will be a Series (as you are only aggregating mean salary), you can use idxmax to then retrieve the index where the max salary occurs. However, if there are multiple occupations that share the same max salary, this will only return one of those occupations.

df = pd.DataFrame({'Occupation':list('aaabbbccc'),'Salary':[1,2,3,4,5,6,7,8,9]})
occupation_max_salary = df.groupby('Occupation').agg({'Salary':'mean'}).idxmax()[0]

occupation_max_salary is 'c' as expected.

So if you need to account for possible ties in mean salary, then you can try the following:

df2 = pd.DataFrame({'Occupation':list('aaabbbccc'),'Salary':[1,2,3,7,8,9,7,8,9]})
salaries = df.groupby('Occupation').agg({'Salary':'mean'})
occupation_max_salary = salaries[salaries == salaries.max()].dropna().index.tolist()

In this case, occupation_max_salary is ['b','c']

Answered By: Derek O
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.