Remove group from pandas df if at least one group member consistently meets condition
Question:
I have a pandas dataframe that looks like this:
import pandas as pd
d = {'name': ['peter', 'peter', 'peter', 'peter', 'peter', 'peter', 'david', 'david', 'david', 'david', 'david', 'david'],
'class': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C'],
'value': [2, 0, 3, 5, 0, 0, 4, 7, 0, 9, 1, 0]}
df = pd.DataFrame(data=d)
df
name class value
peter A 2
peter B 0
peter A 3
peter B 5
peter A 0
peter B 0
david A 4
david B 7
david C 0
david A 9
david B 1
david C 0
I would like to group this dataframe by name
and class
and delete a whole name
group if at least one group member constantly equals 0. In the example above, all C
‘s of the david
group equal 0. For that reason, I would like to remove the david
group and keep the peter
group, see desired output below. Any advice on how to achieve this?
name class value
peter A 2
peter B 0
peter A 3
peter B 5
peter A 0
peter B 0
Answers:
Solution
# is value zero?
df['is_zero'] = df['value'] == 0
# Check combination of name and class which has all zeros
all_zeros = df.groupby(['name', 'class'], as_index=False)['is_zero'].all()
# filter the names for all such combinations
names = all_zeros.loc[all_zeros['is_zero'], 'name']
# Query the dataframe to exclude such names
result = df.query('name not in @names')
Result
name class value is_zero
0 peter A 2 False
1 peter B 0 True
2 peter A 3 False
3 peter B 5 False
4 peter A 0 True
5 peter B 0 True
Use a groupby.filter
to get the names which meet the condition to filter out and then boolean mask like:
names = df.groupby(["name", "class"]).filter(lambda g: g.value.eq(0).all())["name"]
df[~df["name"].isin(names)]
name class value
0 peter A 2
1 peter B 0
2 peter A 3
3 peter B 5
4 peter A 0
5 peter B 0
Or just in one command:
df[
~df["name"].isin(
df.groupby(["name", "class"]).filter(lambda g: g.value.eq(0).all())["name"]
)
]
You need to loop over the unique names in the dataframe, and then loop over the unique classes within each name. If a class has all zero values then we skip it (don’t add it to our valid names list), otherwise it will add them to our valid names list. We then filter the dataframe without valid names list, like this:
import pandas as pd
# your data
d = {'name': ['peter', 'peter', 'peter', 'peter', 'peter', 'peter', 'david', 'david', 'david', 'david', 'david', 'david'],
'class': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C'],
'value': [2, 0, 3, 5, 0, 0, 4, 7, 0, 9, 1, 0]}
df = pd.DataFrame(data=d)
valid_names = []
# loop over unique names in the dataframe
for name in df['name'].unique():
name_df = df[df['name'] == name]
has_zeros = False
# loop over unique classes in the name_df
for cls in name_df['class'].unique():
cls_df = name_df[name_df['class'] == cls]
if (cls_df['value'] == 0).all():
has_zeros = True
break
# if any class has all zero values, skip the name
if has_zeros:
continue
# add the name to the list of valid names
valid_names.append(name)
# filter the dataframe based on the valid names
filtered = df[df['name'].isin(valid_names)]
# display the filtered dataframe
print(filtered)
Filtering the filter, one liner:
output = (
df.groupby('name').filter(
lambda x: x.groupby('class')['value'].filter(
lambda y: y.eq(0).all()
).all()
)
)
Output:
name class value
0 peter A 2
1 peter B 0
2 peter A 3
3 peter B 5
4 peter A 0
5 peter B 0
I have a pandas dataframe that looks like this:
import pandas as pd
d = {'name': ['peter', 'peter', 'peter', 'peter', 'peter', 'peter', 'david', 'david', 'david', 'david', 'david', 'david'],
'class': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C'],
'value': [2, 0, 3, 5, 0, 0, 4, 7, 0, 9, 1, 0]}
df = pd.DataFrame(data=d)
df
name class value
peter A 2
peter B 0
peter A 3
peter B 5
peter A 0
peter B 0
david A 4
david B 7
david C 0
david A 9
david B 1
david C 0
I would like to group this dataframe by name
and class
and delete a whole name
group if at least one group member constantly equals 0. In the example above, all C
‘s of the david
group equal 0. For that reason, I would like to remove the david
group and keep the peter
group, see desired output below. Any advice on how to achieve this?
name class value
peter A 2
peter B 0
peter A 3
peter B 5
peter A 0
peter B 0
Solution
# is value zero?
df['is_zero'] = df['value'] == 0
# Check combination of name and class which has all zeros
all_zeros = df.groupby(['name', 'class'], as_index=False)['is_zero'].all()
# filter the names for all such combinations
names = all_zeros.loc[all_zeros['is_zero'], 'name']
# Query the dataframe to exclude such names
result = df.query('name not in @names')
Result
name class value is_zero
0 peter A 2 False
1 peter B 0 True
2 peter A 3 False
3 peter B 5 False
4 peter A 0 True
5 peter B 0 True
Use a groupby.filter
to get the names which meet the condition to filter out and then boolean mask like:
names = df.groupby(["name", "class"]).filter(lambda g: g.value.eq(0).all())["name"]
df[~df["name"].isin(names)]
name class value
0 peter A 2
1 peter B 0
2 peter A 3
3 peter B 5
4 peter A 0
5 peter B 0
Or just in one command:
df[
~df["name"].isin(
df.groupby(["name", "class"]).filter(lambda g: g.value.eq(0).all())["name"]
)
]
You need to loop over the unique names in the dataframe, and then loop over the unique classes within each name. If a class has all zero values then we skip it (don’t add it to our valid names list), otherwise it will add them to our valid names list. We then filter the dataframe without valid names list, like this:
import pandas as pd
# your data
d = {'name': ['peter', 'peter', 'peter', 'peter', 'peter', 'peter', 'david', 'david', 'david', 'david', 'david', 'david'],
'class': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'C', 'A', 'B', 'C'],
'value': [2, 0, 3, 5, 0, 0, 4, 7, 0, 9, 1, 0]}
df = pd.DataFrame(data=d)
valid_names = []
# loop over unique names in the dataframe
for name in df['name'].unique():
name_df = df[df['name'] == name]
has_zeros = False
# loop over unique classes in the name_df
for cls in name_df['class'].unique():
cls_df = name_df[name_df['class'] == cls]
if (cls_df['value'] == 0).all():
has_zeros = True
break
# if any class has all zero values, skip the name
if has_zeros:
continue
# add the name to the list of valid names
valid_names.append(name)
# filter the dataframe based on the valid names
filtered = df[df['name'].isin(valid_names)]
# display the filtered dataframe
print(filtered)
Filtering the filter, one liner:
output = (
df.groupby('name').filter(
lambda x: x.groupby('class')['value'].filter(
lambda y: y.eq(0).all()
).all()
)
)
Output:
name class value
0 peter A 2
1 peter B 0
2 peter A 3
3 peter B 5
4 peter A 0
5 peter B 0