Check if a value exists per group and remove groups without this value in a pandas df

Question:

I have a pandas df that looks like this:

import pandas as pd
d = {'value1': [1, 1, 1, 2, 3, 3, 4, 4, 4, 4], 'value2': ['A', 'B', 'C', 'C', 'A', 'B', 'B', 'A', 'A', 'B']}
df = pd.DataFrame(data=d)
df

Per group in column value1 I would like to check if that group contains at least one value ‘C’ in column value2. If a group doesn’t have a ‘C’ value, I would like to exclude that group

    value1  value2
    1       A
    1       B
    1       C
    2       C
    3       A
    3       B
    4       B
    4       A
    4       A
    4       B

The resulting df should look like this:

    value1  value2
    1       A
    1       B
    1       C
    2       C

What’s the best way to achieve this?

Asked By: sampeterson

||

Answers:

use groupby filter

df.groupby('value1').filter(lambda x: x['value2'].eq('C').sum() > 0)
Answered By: Panda Kim

Here is another solution:

  • First establish a list containing the value1 values for which value2 is equal to C: mylist = df[df.value2=='C']['value1'].unique()
  • Then filter the dataframe df, keeping only rows for which value1 is in this list: df[df.value1.isin(mylist)].

Or as a one-liner:

df[df.value1.isin(df[df.value2=='C']['value1'].unique())]

Answered By: Sheldon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.