How to select best row of a grouped dataframe in Pandas

Question:

Lets say I have the following dataframe

df = pd.DataFrame.from_dict({'class':['A', 'A', 'A', 'B','B'],
                            'name': ['max1', 'lisa1', 'max3', 'lisa2', 'lisa3'],
                             'gender': ['m', 'f','m','f','f'],
                            'grade':[2,3,1, 2,4]})
>>> print(df)
  class   name gender  grade
0     A   max1      m      2
1     A  lisa1      f      3
2     A   max3      m      1
3     B  lisa2      f      2
4     B  lisa3      f      4

I want to get the best student(s) of each class.(lower grade is better)

  class  grade   name gender
0     A      1   max3      m
1     B      2  lisa2      f

How would I achieve that with pandas?
Also if the best grade is not unique as in the above example how can I select the first occurrence of best/aggregate them to a list?

To clarify assume that lisa3 has a grade 2 instead of 4. Then the result I wish for would be:

  • First occurrence: same as above
  • aggregating into list:
  class  grade            name  gender
0     A      1          [max3]     [m]
1     B      2  [lisa2, lisa3]  [f, f]
Asked By: v.tralala

||

Answers:

You can use pandas.DataFrame.groupby to group the students by each class and for each class you can get the minimum using transform and then convert the rows that occurs more than once to list using pandas.Series.tolist.

df = pd.DataFrame.from_dict({'class':['A', 'A', 'A', 'B','B'],
                        'name': ['max1', 'lisa1', 'max3', 'lisa2', 'lisa3'],
                         'gender': ['m', 'f','m','f','f'],
                        'grade':[2,3,1, 2,2]})

temp = df[df.grade == df.groupby(['class'])['grade'].transform('min')]
temp.groupby(['class', 'grade']).agg(pd.Series.tolist)

Output :

This gives you the expected output :

                       name  gender
class grade                        
A     1              [max3]     [m]
B     2      [lisa2, lisa3]  [f, f]

DataFrameGroupBy.transform will basically give you the minimal value for each group and returns a DataFrame having the same indexes as the original object filled with the transformed values. In our case min for each group is returned corresponding to each index value.

So df.groupby(['class'])['grade'].transform('min') gives us

0    1
1    1
2    1
3    2
4    2

where 1 and 2 are the minimum values of their respective group, replicated for each row belonging to that group.

Answered By: Himanshuman

Try this:

df.groupby(['class','grade']).agg(list).groupby('class').head(1)

or

df.loc[df['grade'].eq(df.groupby('class')['grade'].transform('min'))].groupby(['class','grade']).agg(list).reset_index()

Output:

                       name  gender
class grade                        
A     1              [max3]     [m]
B     2      [lisa2, lisa3]  [f, f]
Answered By: rhug123

You can groupby apply:

df.groupby("class", as_index=False).apply(lambda x: x[x['grade'].eq(x['grade'].min())].
                                          groupby(['class', 'grade'], as_index=False).
                                          agg(list)).reset_index(drop=True)

  class  grade            name  gender
0     A      1          [max3]     [m]
1     B      2  [lisa2, lisa3]  [f, f]
Answered By: Space Impact
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.