How to find the smallest maximum of a column with pandas after filtering?
Question:
I have a dataframe:
import pandas as pd
df = pd.DataFrame(
{'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'variable': [8, 9, 10, 11, 2, 3, 4, 5],
'another_variable': [1, 1, 1, 2, 1, 1, 2, 2]}
)
I would like to find the largest value of variable
(which is counting upwards) where another_variable
is still equal to 1.
I can group the data frame and filter the relevant rows:
df.groupby(['team']).apply(lambda g: g[g['another_variable'] == 1])
# Output:
# team variable another_variable
#team
#A 0 A 8 1
# 1 A 9 1
# 2 A 10 1
#B 4 B 2 1
# 5 B 3 1
But if I add .variable.min()
, I only get a single value, instead of one value for each group (which I then could calculate the maximum of). What am I doing wrong?
Answers:
Filter first, then groupby
:
df[df['another_variable'].eq(1)].groupby('team')['variable'].max()
Output:
team
A 10
B 3
Name: variable, dtype: int64
If there is a possibility that a group has no 1 and you’d like to have NaN
, then use:
df['variable'].where(df['another_variable'].eq(1)).groupby(df['team']).max()
Example if there was no 1
in A
:
team
A NaN
B 3
Name: variable, dtype: int64
Following appears to be a variant of what Mozway has already proposed :
import pandas as pd
df = pd.DataFrame(
{'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'variable': [8, 9, 10, 11, 2, 3, 4, 5],
'another_variable': [1, 1, 1, 2, 1, 1, 2, 2]}
)
s = (df.groupby(['team', 'another_variable'])['variable']
.max()
.reset_index(['team', 'another_variable'])
)
print( s[s['another_variable'].eq(1)] )
team another_variable variable
0 A 1 10
2 B 1 3
I have a dataframe:
import pandas as pd
df = pd.DataFrame(
{'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'variable': [8, 9, 10, 11, 2, 3, 4, 5],
'another_variable': [1, 1, 1, 2, 1, 1, 2, 2]}
)
I would like to find the largest value of variable
(which is counting upwards) where another_variable
is still equal to 1.
I can group the data frame and filter the relevant rows:
df.groupby(['team']).apply(lambda g: g[g['another_variable'] == 1])
# Output:
# team variable another_variable
#team
#A 0 A 8 1
# 1 A 9 1
# 2 A 10 1
#B 4 B 2 1
# 5 B 3 1
But if I add .variable.min()
, I only get a single value, instead of one value for each group (which I then could calculate the maximum of). What am I doing wrong?
Filter first, then groupby
:
df[df['another_variable'].eq(1)].groupby('team')['variable'].max()
Output:
team
A 10
B 3
Name: variable, dtype: int64
If there is a possibility that a group has no 1 and you’d like to have NaN
, then use:
df['variable'].where(df['another_variable'].eq(1)).groupby(df['team']).max()
Example if there was no 1
in A
:
team
A NaN
B 3
Name: variable, dtype: int64
Following appears to be a variant of what Mozway has already proposed :
import pandas as pd
df = pd.DataFrame(
{'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'variable': [8, 9, 10, 11, 2, 3, 4, 5],
'another_variable': [1, 1, 1, 2, 1, 1, 2, 2]}
)
s = (df.groupby(['team', 'another_variable'])['variable']
.max()
.reset_index(['team', 'another_variable'])
)
print( s[s['another_variable'].eq(1)] )
team another_variable variable
0 A 1 10
2 B 1 3