How to get the whole row based on a max value from one column in pandas.groupby().max()?
Question:
I want to specify that I need to get the whole row for a max value, not different max values from multiple rows, in my example this should be based on the column ‘Number’. Such as this way:
import pandas as pd
data = {
'Number':[12,55,3,2,88,17],
'People':['Zack','Zack','Merry','Merry','Cross','Cross'],
'Random':[353,0.5454,0.5454336,32,-7,4]
}
df = pd.DataFrame (data, columns = ['Number','People','Random'])
print(df,'n')
max_values = df.groupby('People').max()
print(max_values)
Here is the result:
Number People Random
0 12 Zack 353.000000
1 55 Zack 0.545400
2 3 Merry 0.545434
3 2 Merry 32.000000
4 88 Cross -7.000000
5 17 Cross 4.000000
Number Random
People
Cross 88 4.0
Merry 3 32.0
Zack 55 353.0
Here is the expected result for max_values
:
Number Random
People
Cross 88 -7.000000
Merry 3 0.545434
Zack 55 353.0
Answers:
You could do the following:
import pandas as pd
data = {
'Number':[12,55,3,2,88,17],
'People':['Zack','Zack','Merry','Merry','Cross','Cross'],
'Random':[353,0.5454,0.5454336,32,-7,4]
}
df = pd.DataFrame (data, columns = ['Number','People','Random'])
print(df,'n')
res = df[df.groupby(['People'])['Number'].transform(max) == df['Number']].set_index('People')
print(res)
Which gives the following output:
Number Random
People
Zack 55 0.545400
Merry 3 0.545434
Cross 88 -7.000000
The problem in your code was that max()
is applied per column so by using slicing you can avoid this issue.
Note The expected output is a mistake in the question
You could try something like this –
df['max_number'] = df.groupby(['People'])['Number'].transform(max)
df[df.Number == df.max_number].drop('max_number', axis=1).set_index('People')
Number Random
People
Zack 55 0.545400
Merry 3 0.545434
Cross 88 -7.000000
This is more straightforward way to do it IMHO.
df.sort_values('Number').groupby('People').tail(1)
(Maybe also change your column name to "Name")
I want to specify that I need to get the whole row for a max value, not different max values from multiple rows, in my example this should be based on the column ‘Number’. Such as this way:
import pandas as pd
data = {
'Number':[12,55,3,2,88,17],
'People':['Zack','Zack','Merry','Merry','Cross','Cross'],
'Random':[353,0.5454,0.5454336,32,-7,4]
}
df = pd.DataFrame (data, columns = ['Number','People','Random'])
print(df,'n')
max_values = df.groupby('People').max()
print(max_values)
Here is the result:
Number People Random
0 12 Zack 353.000000
1 55 Zack 0.545400
2 3 Merry 0.545434
3 2 Merry 32.000000
4 88 Cross -7.000000
5 17 Cross 4.000000
Number Random
People
Cross 88 4.0
Merry 3 32.0
Zack 55 353.0
Here is the expected result for max_values
:
Number Random
People
Cross 88 -7.000000
Merry 3 0.545434
Zack 55 353.0
You could do the following:
import pandas as pd
data = {
'Number':[12,55,3,2,88,17],
'People':['Zack','Zack','Merry','Merry','Cross','Cross'],
'Random':[353,0.5454,0.5454336,32,-7,4]
}
df = pd.DataFrame (data, columns = ['Number','People','Random'])
print(df,'n')
res = df[df.groupby(['People'])['Number'].transform(max) == df['Number']].set_index('People')
print(res)
Which gives the following output:
Number Random
People
Zack 55 0.545400
Merry 3 0.545434
Cross 88 -7.000000
The problem in your code was that max()
is applied per column so by using slicing you can avoid this issue.
Note The expected output is a mistake in the question
You could try something like this –
df['max_number'] = df.groupby(['People'])['Number'].transform(max)
df[df.Number == df.max_number].drop('max_number', axis=1).set_index('People')
Number Random
People
Zack 55 0.545400
Merry 3 0.545434
Cross 88 -7.000000
This is more straightforward way to do it IMHO.
df.sort_values('Number').groupby('People').tail(1)
(Maybe also change your column name to "Name")