How to find the smallest maximum of a column with pandas after filtering?

Question:

I have a dataframe:

import pandas as pd
df = pd.DataFrame(
    {'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
     'variable': [8, 9, 10, 11, 2, 3, 4, 5],
     'another_variable': [1, 1, 1, 2, 1, 1, 2, 2]}
)

I would like to find the largest value of variable (which is counting upwards) where another_variable is still equal to 1.

I can group the data frame and filter the relevant rows:

df.groupby(['team']).apply(lambda g: g[g['another_variable'] == 1])

# Output:
#       team    variable    another_variable
#team               
#A  0   A       8           1
#   1   A       9           1
#   2   A       10          1
#B  4   B       2           1
#   5   B       3           1

But if I add .variable.min(), I only get a single value, instead of one value for each group (which I then could calculate the maximum of). What am I doing wrong?

Asked By: Maxim Moloshenko

||

Answers:

Filter first, then groupby:

df[df['another_variable'].eq(1)].groupby('team')['variable'].max()

Output:

team
A    10
B     3
Name: variable, dtype: int64

If there is a possibility that a group has no 1 and you’d like to have NaN, then use:

df['variable'].where(df['another_variable'].eq(1)).groupby(df['team']).max()

Example if there was no 1 in A:

team
A   NaN
B     3
Name: variable, dtype: int64
Answered By: mozway

Following appears to be a variant of what Mozway has already proposed :

import pandas as pd

df = pd.DataFrame(
    {'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
     'variable': [8, 9, 10, 11, 2, 3, 4, 5],
     'another_variable': [1, 1, 1, 2, 1, 1, 2, 2]}
)

s = (df.groupby(['team', 'another_variable'])['variable']
       .max()
       .reset_index(['team', 'another_variable'])
       )

print( s[s['another_variable'].eq(1)] )
  team  another_variable  variable
0    A                 1        10
2    B                 1         3
Answered By: Laurent B.
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.