get max value after grouping two columns pandas

Question:

I want to get the trip_time with the highest count for each age_group

age_group trip_time counts
18 – 30yrs 01am 23
18 – 30yrs 02am 2
18 – 30yrs 03am 213
31 – 50yrs 01am 74
31 – 50yrs 02am 211
31 – 50yrs 03am 852
51 – 70yrs 01am 23
51 – 70yrs 02am 11
51 – 70yrs 03am 101

Expected output:

age_group trip_time counts
18 – 30yrs 03am 213
31 – 50yrs 03am 852
51 – 70yrs 03am 101
trip_time_age_group.groupby(['age_group', 'trip_time'])['counts'].max()

But it gives me wrong result

Asked By: Khola

||

Answers:

groupby ‘age-group’ and then use transform to get the max count for each age-group, which you then compare with the count of the DF, to get the resultset

df.loc[df.groupby('age_group')['counts'].transform('max').eq(df['counts'])]
    age_group trip_time  counts
2  18 - 30yrs      03am     213
5  31 - 50yrs      03am     852
8  51 - 70yrs      03am     101
Answered By: Naveed

Groupby only age_group, then find the rows in each group with the max counts.

df.groupby('age_group').apply(
    lambda sf: sf.loc[sf['counts'] == sf['counts'].max()]
).reset_index(drop=True)
    age_group trip_time  counts
0  18 - 30yrs      03am     213
1  31 - 50yrs      03am     852
2  51 - 70yrs      03am     101

You could also do sf.query('counts == counts.max()') instead of sf.loc[...].

Answered By: wjandrea
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.