get max value after grouping two columns pandas
Question:
I want to get the trip_time with the highest count for each age_group
age_group
trip_time
counts
18 – 30yrs
01am
23
18 – 30yrs
02am
2
18 – 30yrs
03am
213
31 – 50yrs
01am
74
31 – 50yrs
02am
211
31 – 50yrs
03am
852
51 – 70yrs
01am
23
51 – 70yrs
02am
11
51 – 70yrs
03am
101
Expected output:
age_group
trip_time
counts
18 – 30yrs
03am
213
31 – 50yrs
03am
852
51 – 70yrs
03am
101
trip_time_age_group.groupby(['age_group', 'trip_time'])['counts'].max()
But it gives me wrong result
Answers:
groupby ‘age-group’ and then use transform to get the max count for each age-group, which you then compare with the count of the DF, to get the resultset
df.loc[df.groupby('age_group')['counts'].transform('max').eq(df['counts'])]
age_group trip_time counts
2 18 - 30yrs 03am 213
5 31 - 50yrs 03am 852
8 51 - 70yrs 03am 101
Groupby only age_group
, then find the rows in each group with the max counts
.
df.groupby('age_group').apply(
lambda sf: sf.loc[sf['counts'] == sf['counts'].max()]
).reset_index(drop=True)
age_group trip_time counts
0 18 - 30yrs 03am 213
1 31 - 50yrs 03am 852
2 51 - 70yrs 03am 101
You could also do sf.query('counts == counts.max()')
instead of sf.loc[...]
.
I want to get the trip_time with the highest count for each age_group
age_group | trip_time | counts |
---|---|---|
18 – 30yrs | 01am | 23 |
18 – 30yrs | 02am | 2 |
18 – 30yrs | 03am | 213 |
31 – 50yrs | 01am | 74 |
31 – 50yrs | 02am | 211 |
31 – 50yrs | 03am | 852 |
51 – 70yrs | 01am | 23 |
51 – 70yrs | 02am | 11 |
51 – 70yrs | 03am | 101 |
Expected output:
age_group | trip_time | counts |
---|---|---|
18 – 30yrs | 03am | 213 |
31 – 50yrs | 03am | 852 |
51 – 70yrs | 03am | 101 |
trip_time_age_group.groupby(['age_group', 'trip_time'])['counts'].max()
But it gives me wrong result
groupby ‘age-group’ and then use transform to get the max count for each age-group, which you then compare with the count of the DF, to get the resultset
df.loc[df.groupby('age_group')['counts'].transform('max').eq(df['counts'])]
age_group trip_time counts
2 18 - 30yrs 03am 213
5 31 - 50yrs 03am 852
8 51 - 70yrs 03am 101
Groupby only age_group
, then find the rows in each group with the max counts
.
df.groupby('age_group').apply(
lambda sf: sf.loc[sf['counts'] == sf['counts'].max()]
).reset_index(drop=True)
age_group trip_time counts
0 18 - 30yrs 03am 213
1 31 - 50yrs 03am 852
2 51 - 70yrs 03am 101
You could also do sf.query('counts == counts.max()')
instead of sf.loc[...]
.