Pandas Dataframe get maximum with respect to other entries
Question:
I have a Dataframe like this:
name
phase
value
BOB
1
.9
BOB
2
.05
BOB
3
.05
JOHN
2
.45
JOHN
3
.45
JOHN
4
.05
FRANK
1
.4
FRANK
3
.6
I want to find which entry in column ‘phase’ has the maximum value in column ‘value’.
If more than one share the same maximum value keep the first or a random value for ‘phase’.
Desired result table:
name
phase
value
BOB
1
.9
JOHN
2
.45
FRANK
3
.6
my approach was:
df.groupby(['name'])[['phase','value']].max()
but it returned incorrect values.
Answers:
Try to sort the dataframe first:
df = df.sort_values(
by=["name", "value", "phase"], ascending=[True, False, True]
)
x = df.groupby("name", as_index=False).first()
print(x)
Prints:
name phase value
0 BOB 1 0.90
1 FRANK 1 0.60
2 JOHN 1 0.45
You don’t need to use groupby
. Sort values by value
and phase
(adjust the order if necessary) and drop duplicates by name
:
out = (df.sort_values(['value', 'phase'], ascending=[False, True])
.drop_duplicates('name')
.sort_index(ignore_index=True))
print(out)
# Output
name phase value
0 BOB 1 0.90
1 JOHN 2 0.45
2 FRANK 3 0.60
A possible solution, that could avoid sorting is with groupby:
df.loc[df.groupby('name', sort = False).value.idxmax()]
name phase value
0 BOB 1 0.90
3 JOHN 2 0.45
7 FRANK 3 0.60
You may check
out = df.sort_values('value',ascending=False).drop_duplicates('name').sort_index()
Out[434]:
name phase value
0 BOB 1 0.90
3 JOHN 2 0.45
7 FRANK 3 0.60
I have a Dataframe like this:
name | phase | value |
---|---|---|
BOB | 1 | .9 |
BOB | 2 | .05 |
BOB | 3 | .05 |
JOHN | 2 | .45 |
JOHN | 3 | .45 |
JOHN | 4 | .05 |
FRANK | 1 | .4 |
FRANK | 3 | .6 |
I want to find which entry in column ‘phase’ has the maximum value in column ‘value’.
If more than one share the same maximum value keep the first or a random value for ‘phase’.
Desired result table:
name | phase | value |
---|---|---|
BOB | 1 | .9 |
JOHN | 2 | .45 |
FRANK | 3 | .6 |
my approach was:
df.groupby(['name'])[['phase','value']].max()
but it returned incorrect values.
Try to sort the dataframe first:
df = df.sort_values(
by=["name", "value", "phase"], ascending=[True, False, True]
)
x = df.groupby("name", as_index=False).first()
print(x)
Prints:
name phase value
0 BOB 1 0.90
1 FRANK 1 0.60
2 JOHN 1 0.45
You don’t need to use groupby
. Sort values by value
and phase
(adjust the order if necessary) and drop duplicates by name
:
out = (df.sort_values(['value', 'phase'], ascending=[False, True])
.drop_duplicates('name')
.sort_index(ignore_index=True))
print(out)
# Output
name phase value
0 BOB 1 0.90
1 JOHN 2 0.45
2 FRANK 3 0.60
A possible solution, that could avoid sorting is with groupby:
df.loc[df.groupby('name', sort = False).value.idxmax()]
name phase value
0 BOB 1 0.90
3 JOHN 2 0.45
7 FRANK 3 0.60
You may check
out = df.sort_values('value',ascending=False).drop_duplicates('name').sort_index()
Out[434]:
name phase value
0 BOB 1 0.90
3 JOHN 2 0.45
7 FRANK 3 0.60