Pandas Groupby get max of multiple columns but in order
Question:
I have below pandas dataframe
COlA ColB Result Freq
A B 1 3000
A C 0.2 4000
A D 1 5000
A E 0.3 9000
A F 0.4 8000
B A 0.4 1000
B C 0.1 4000
B D 0.1 5000
B E 0.2 9000
B F 0.3 8000
...
I want to groupby
ColA
and get max of Result
and Types
in order i.e. in such a way that it should first find the max of Result
column if there are more than one max row then look at Freq
and find the max there. I’ve tried using groupby().max().reset_index()
but not getting the desired output
Expected Output
COlA ColB Result Freq
A D 1 5000
B A 0.4 1000
...
Answers:
You can sort by Results/Freq and then groupby
+first
:
(df.sort_values(by=['Result', 'Freq'], ascending=False)
.groupby(['COlA'], as_index=False).first()
)
output:
COlA ColB Result Freq
0 A D 1.0 5000
1 B A 0.4 1000
NB. warning your column name is COlA
(with a capital O)
def function1(dd:pd.DataFrame):
return dd.sort_values(by=['Result','Freq'],ascending=[False,False]).head(1)
df1.groupby('COlA').apply(function1).reset_index(drop=True)
out
COlA ColB Result Freq
0 A D 1.0 5000
1 B A 0.4 1000
I have below pandas dataframe
COlA ColB Result Freq
A B 1 3000
A C 0.2 4000
A D 1 5000
A E 0.3 9000
A F 0.4 8000
B A 0.4 1000
B C 0.1 4000
B D 0.1 5000
B E 0.2 9000
B F 0.3 8000
...
I want to groupby
ColA
and get max of Result
and Types
in order i.e. in such a way that it should first find the max of Result
column if there are more than one max row then look at Freq
and find the max there. I’ve tried using groupby().max().reset_index()
but not getting the desired output
Expected Output
COlA ColB Result Freq
A D 1 5000
B A 0.4 1000
...
You can sort by Results/Freq and then groupby
+first
:
(df.sort_values(by=['Result', 'Freq'], ascending=False)
.groupby(['COlA'], as_index=False).first()
)
output:
COlA ColB Result Freq
0 A D 1.0 5000
1 B A 0.4 1000
NB. warning your column name is COlA
(with a capital O)
def function1(dd:pd.DataFrame):
return dd.sort_values(by=['Result','Freq'],ascending=[False,False]).head(1)
df1.groupby('COlA').apply(function1).reset_index(drop=True)
out
COlA ColB Result Freq
0 A D 1.0 5000
1 B A 0.4 1000