Extract row with maximum value in a group pandas dataframe

Question:

A similar question is asked here:
Python : Getting the Row which has the max value in groups using groupby

However, I just need one record per group even if there are more than one record with maximum value in that group.

In the example below, I need one record for “s2”. For me it doesn’t matter which one.

>>> df = DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'count':[3,2,5,10,10,6]})
>>> df
   Mt Sp  Value  count
0  s1  a      1      3
1  s1  b      2      2
2  s2  c      3      5
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> idx = df.groupby(['Mt'])['count'].transform(max) == df['count']
>>> df[idx]
   Mt Sp  Value  count
0  s1  a      1      3
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> 
Asked By: user1140126

||

Answers:

You can use first

In [14]: df.groupby('Mt').first()
Out[14]: 
   Sp  Value  count
Mt                 
s1  a      1      3
s2  c      3      5
s3  f      6      6

Update

Set as_index=False to achieve your goal

In [28]: df.groupby('Mt', as_index=False).first()
Out[28]: 
   Mt Sp  Value  count
0  s1  a      1      3
1  s2  c      3      5
2  s3  f      6      6 

Update Again

Sorry for misunderstanding what you mean. You can sort it first if you want the one with max count in a group

In [196]: df.sort('count', ascending=False).groupby('Mt', as_index=False).first()
Out[196]: 
   Mt Sp  Value  count
0  s1  a      1      3
1  s2  e      5     10
2  s3  f      6      6
Answered By: waitingkuo

To get first occurence of maximum count you can use pandas.DataFrame.idxmax() function:

>>> df.iloc[df.groupby(['Mt']).apply(lambda x: x['count'].idxmax())]
   Mt Sp  Value  count
0  s1  a      1      3
3  s2  d      4     10
5  s3  f      6      6
Answered By: Roman Pekar

Playing off of Roman Pekar’s answer, I found that that the following code would work:

from math import isnan
df.iloc[[int(x) for x in df.groupby(by=df.Mt).apply(lambda x: x['count'].idxmax()).values if not isnan(y)]]

Note the isnan condition, as my application had some nan entries in the column we are maximizing over.

Answered By: Ian Schultz

The answers already given don’t show clearly what’s by far the fastest option.
Sort by the row where you want the max value, and then drop duplicates (takes as parameter the name of the rows to take into account for evaluating duplicates)

df.sort_values('count', ascending=False).drop_duplicates(['Mt'])

NB : Yes that answer is already given in a comment on the question but it’s very easy to miss it. And it will be up to 10 times faster as groupby.

Answered By: jmd
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.