How to get the whole row based on a max value from one column in pandas.groupby().max()?

Question:

I want to specify that I need to get the whole row for a max value, not different max values from multiple rows, in my example this should be based on the column ‘Number’. Such as this way:

import pandas as pd

data = {
    'Number':[12,55,3,2,88,17],
    'People':['Zack','Zack','Merry','Merry','Cross','Cross'],
    'Random':[353,0.5454,0.5454336,32,-7,4]
}

df = pd.DataFrame (data, columns = ['Number','People','Random'])

print(df,'n')

max_values = df.groupby('People').max()

print(max_values)

Here is the result:

   Number People      Random
0      12   Zack  353.000000
1      55   Zack    0.545400
2       3  Merry    0.545434
3       2  Merry   32.000000
4      88  Cross   -7.000000
5      17  Cross    4.000000 

        Number  Random
People                
Cross       88     4.0
Merry        3    32.0
Zack        55   353.0

Here is the expected result for max_values:

        Number  Random
People                
Cross       88    -7.000000
Merry        3    0.545434
Zack        55   353.0
Asked By: Samir Ahmane

||

Answers:

You could do the following:

import pandas as pd

data = {
    'Number':[12,55,3,2,88,17],
    'People':['Zack','Zack','Merry','Merry','Cross','Cross'],
    'Random':[353,0.5454,0.5454336,32,-7,4]
}

df = pd.DataFrame (data, columns = ['Number','People','Random'])

print(df,'n')

res = df[df.groupby(['People'])['Number'].transform(max) == df['Number']].set_index('People')
print(res)

Which gives the following output:

        Number    Random
People                  
Zack        55  0.545400
Merry        3  0.545434
Cross       88 -7.000000

The problem in your code was that max() is applied per column so by using slicing you can avoid this issue.

Note The expected output is a mistake in the question

Answered By: David

You could try something like this –

df['max_number'] = df.groupby(['People'])['Number'].transform(max)
df[df.Number == df.max_number].drop('max_number', axis=1).set_index('People')

         Number Random
People                  
Zack        55  0.545400
Merry        3  0.545434
Cross       88 -7.000000
Answered By: Sajan

This is more straightforward way to do it IMHO.

df.sort_values('Number').groupby('People').tail(1)

(Maybe also change your column name to "Name")

Answered By: jlansey