Filtering DataFrame on groups where all elements of one group fullfills a one of various conditions
Question:
I need to filter a data frame with different groups. The data frame looks as follows:
df = pd.DataFrame({"group":[1,1,1,
2,2,2,2,
3,3,3,
4,4],
"percentage":[70,70,70,
45,80,60,70,
71,85,90,
np.nan, np.nan]})
My goal is to return a data frame containing only groups that satisfy one of the two following conditions:
- All observations of the group have percentage > 70
- All observations of the group are np.nan
I know that I have to group the data frame first and then apply the conditions. This might be easily done using a for loop for groups. However, using such a solution might be very slow. Any help would be appreciated.
Answers:
You can try with filter
df = df.groupby('group').filter(lambda x : x['percentage'].gt(70).all() | x['percentage'].isna().all() )
Out[25]:
group percentage
7 3 71.0
8 3 85.0
9 3 90.0
10 4 NaN
11 4 NaN
USE- df[(df.percentage > 70) | df['percentage'].isnull()]
Output-
group percentage
4 2 80.0
7 3 71.0
8 3 85.0
9 3 90.0
10 4 NaN
11 4 NaN
I need to filter a data frame with different groups. The data frame looks as follows:
df = pd.DataFrame({"group":[1,1,1,
2,2,2,2,
3,3,3,
4,4],
"percentage":[70,70,70,
45,80,60,70,
71,85,90,
np.nan, np.nan]})
My goal is to return a data frame containing only groups that satisfy one of the two following conditions:
- All observations of the group have percentage > 70
- All observations of the group are np.nan
I know that I have to group the data frame first and then apply the conditions. This might be easily done using a for loop for groups. However, using such a solution might be very slow. Any help would be appreciated.
You can try with filter
df = df.groupby('group').filter(lambda x : x['percentage'].gt(70).all() | x['percentage'].isna().all() )
Out[25]:
group percentage
7 3 71.0
8 3 85.0
9 3 90.0
10 4 NaN
11 4 NaN
USE- df[(df.percentage > 70) | df['percentage'].isnull()]
Output-
group percentage
4 2 80.0
7 3 71.0
8 3 85.0
9 3 90.0
10 4 NaN
11 4 NaN