Remove group from pandas df if at least one group member consistently meets condition

Question:

I have a pandas dataframe that looks like this:

import pandas as pd
d = {'name': ['peter', 'peter', 'peter', 'peter', 'peter', 'peter', 'david', 'david', 'david', 'david', 'david', 'david'], 
     'class': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C'], 
     'value': [2, 0, 3, 5, 0, 0, 4, 7, 0, 9, 1, 0]}
df = pd.DataFrame(data=d)
df

name    class   value
peter   A       2
peter   B       0
peter   A       3
peter   B       5
peter   A       0
peter   B       0
david   A       4
david   B       7
david   C       0
david   A       9
david   B       1
david   C       0

I would like to group this dataframe by name and class and delete a whole name group if at least one group member constantly equals 0. In the example above, all C‘s of the david group equal 0. For that reason, I would like to remove the david group and keep the peter group, see desired output below. Any advice on how to achieve this?

name    class   value
peter   A       2
peter   B       0
peter   A       3
peter   B       5
peter   A       0
peter   B       0
Asked By: sampeterson

||

Answers:

Solution

# is value zero?
df['is_zero'] = df['value'] == 0

# Check combination of name and class which has all zeros
all_zeros = df.groupby(['name', 'class'], as_index=False)['is_zero'].all()

# filter the names for all such combinations
names = all_zeros.loc[all_zeros['is_zero'], 'name']

# Query the dataframe to exclude such names
result = df.query('name not in @names')

Result

    name class  value  is_zero
0  peter     A      2    False
1  peter     B      0     True
2  peter     A      3    False
3  peter     B      5    False
4  peter     A      0     True
5  peter     B      0     True
Answered By: Shubham Sharma

Use a groupby.filter to get the names which meet the condition to filter out and then boolean mask like:

names = df.groupby(["name", "class"]).filter(lambda g: g.value.eq(0).all())["name"]
df[~df["name"].isin(names)]

    name class  value
0  peter     A      2
1  peter     B      0
2  peter     A      3
3  peter     B      5
4  peter     A      0
5  peter     B      0

Or just in one command:

df[
    ~df["name"].isin(
        df.groupby(["name", "class"]).filter(lambda g: g.value.eq(0).all())["name"]
    )
]
Answered By: SomeDude

You need to loop over the unique names in the dataframe, and then loop over the unique classes within each name. If a class has all zero values then we skip it (don’t add it to our valid names list), otherwise it will add them to our valid names list. We then filter the dataframe without valid names list, like this:

import pandas as pd

# your data
d = {'name': ['peter', 'peter', 'peter', 'peter', 'peter', 'peter', 'david', 'david', 'david', 'david', 'david', 'david'], 
     'class': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C'], 
     'value': [2, 0, 3, 5, 0, 0, 4, 7, 0, 9, 1, 0]}

df = pd.DataFrame(data=d)

valid_names = []

# loop over unique names in the dataframe
for name in df['name'].unique():
    name_df = df[df['name'] == name]
    has_zeros = False
    
    # loop over unique classes in the name_df
    for cls in name_df['class'].unique():
        cls_df = name_df[name_df['class'] == cls]
        if (cls_df['value'] == 0).all():
            has_zeros = True
            break
    
    # if any class has all zero values, skip the name
    if has_zeros:
        continue
    
    # add the name to the list of valid names
    valid_names.append(name)

# filter the dataframe based on the valid names
filtered = df[df['name'].isin(valid_names)]

# display the filtered dataframe
print(filtered)
Answered By: Beatdown

Filtering the filter, one liner:

output = (
    df.groupby('name').filter(
        lambda x: x.groupby('class')['value'].filter(
            lambda y: y.eq(0).all()
        ).all()
    )
)

Output:

    name class  value
0  peter     A      2
1  peter     B      0
2  peter     A      3
3  peter     B      5
4  peter     A      0
5  peter     B      0
Answered By: BeRT2me
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.