How to select best row of a grouped dataframe in Pandas

Question

Lets say I have the following dataframe

df = pd.DataFrame.from_dict({'class':['A', 'A', 'A', 'B','B'],
                            'name': ['max1', 'lisa1', 'max3', 'lisa2', 'lisa3'],
                             'gender': ['m', 'f','m','f','f'],
                            'grade':[2,3,1, 2,4]})
>>> print(df)
  class   name gender  grade
0     A   max1      m      2
1     A  lisa1      f      3
2     A   max3      m      1
3     B  lisa2      f      2
4     B  lisa3      f      4

I want to get the best student(s) of each class.(lower grade is better)

  class  grade   name gender
0     A      1   max3      m
1     B      2  lisa2      f

How would I achieve that with pandas?
Also if the best grade is not unique as in the above example how can I select the first occurrence of best/aggregate them to a list?

To clarify assume that lisa3 has a grade 2 instead of 4. Then the result I wish for would be:

First occurrence: same as above
aggregating into list:

  class  grade            name  gender
0     A      1          [max3]     [m]
1     B      2  [lisa2, lisa3]  [f, f]

Asked By: v.tralala

||

Source

Answer 1

You can use pandas.DataFrame.groupby to group the students by each class and for each class you can get the minimum using transform and then convert the rows that occurs more than once to list using pandas.Series.tolist.

df = pd.DataFrame.from_dict({'class':['A', 'A', 'A', 'B','B'],
                        'name': ['max1', 'lisa1', 'max3', 'lisa2', 'lisa3'],
                         'gender': ['m', 'f','m','f','f'],
                        'grade':[2,3,1, 2,2]})

temp = df[df.grade == df.groupby(['class'])['grade'].transform('min')]
temp.groupby(['class', 'grade']).agg(pd.Series.tolist)

Output :

This gives you the expected output :

                       name  gender
class grade                        
A     1              [max3]     [m]
B     2      [lisa2, lisa3]  [f, f]

DataFrameGroupBy.transform will basically give you the minimal value for each group and returns a DataFrame having the same indexes as the original object filled with the transformed values. In our case min for each group is returned corresponding to each index value.

So df.groupby(['class'])['grade'].transform('min') gives us

where 1 and 2 are the minimum values of their respective group, replicated for each row belonging to that group.

Answered By: Himanshuman

Answer 2

Try this:

df.groupby(['class','grade']).agg(list).groupby('class').head(1)

or

df.loc[df['grade'].eq(df.groupby('class')['grade'].transform('min'))].groupby(['class','grade']).agg(list).reset_index()

Output:

                       name  gender
class grade                        
A     1              [max3]     [m]
B     2      [lisa2, lisa3]  [f, f]

Answered By: rhug123

Answer 3

You can groupby apply:

df.groupby("class", as_index=False).apply(lambda x: x[x['grade'].eq(x['grade'].min())].
                                          groupby(['class', 'grade'], as_index=False).
                                          agg(list)).reset_index(drop=True)

  class  grade            name  gender
0     A      1          [max3]     [m]
1     B      2  [lisa2, lisa3]  [f, f]

Answered By: Space Impact

How to select best row of a grouped dataframe in Pandas

Question:

Answers:

Output :