Select the dataframe based on multiple conditions on a group like all values in a column are 0 and value = x in another column in pandas
Question:
I have a dataframe
df = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["D",0,"rem"],["C",2,"rem"],["D",0,"rem"],["D",0,"rem"]],columns=["id","val1","val2"])
id val1 val2
A 0 ret
C 2 rem
B 1 ret
A 0 rem
B 0 rem
D 0 rem
C 2 rem
D 0 rem
D 0 rem
Remove the id group where val1 is 0 in all the rows of group and val2 is rem in all the rows of group. Here for id D, val1 is 0 for all the rows and val2 is rem for all the rows so remove D id.
Expected Output
df_out = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["C",2,"rem"]],columns=["id","val1","val2"])
id val1 val2
A 0 ret
C 2 rem
B 1 ret
A 0 rem
B 0 rem
C 2 rem
How to do it in pandas?
Answers:
You can use boolean indexing with two masks:
# is there at least one non 0 per group?
m1 = df['val1'].ne(0).groupby(df['id']).transform('any')
# is there at least one non-rem?
m2 = df['val2'].ne('rem').groupby(df['id']).transform('any')
# keep is any is True
out = df[m1|m2]
Output:
id val1 val2
0 A 0 ret
1 C 2 rem
2 B 1 ret
3 A 0 rem
4 B 0 rem
6 C 2 rem
Intermediates:
id val1 val2 m1 m2
0 A 0 ret False True
1 C 2 rem True False
2 B 1 ret True True
3 A 0 rem False True
4 B 0 rem True True
5 D 0 rem False False
6 C 2 rem True False
7 D 0 rem False False
8 D 0 rem False False
You can use a boolean mask for each condition then broadcast the true condition to all group members then invert the mask:
>>> df[~(df['val1'].eq(0) & df['val2'].eq('rem')).groupby(df['id']).transform('all')]
id val1 val2
0 A 0 ret
1 C 2 rem
2 B 1 ret
3 A 0 rem
4 B 0 rem
6 C 2 rem
Another possible solution:
g = df.groupby('id')
pd.concat([x[1] for x in g if
~(x[1]['val1'].eq(0).all() & x[1]['val2'].eq('rem').all())])
Output:
id val1 val2
0 A 0 ret
3 A 0 rem
2 B 1 ret
4 B 0 rem
1 C 2 rem
6 C 2 rem
Solution without groupby with Series.isin
:
df = df[df['id'].isin(df.loc[df['val1'].ne(0) | df['val2'].ne('rem'), 'id'])]
print (df)
id val1 val2
0 A 0 ret
1 C 2 rem
2 B 1 ret
3 A 0 rem
4 B 0 rem
6 C 2 rem
I have a dataframe
df = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["D",0,"rem"],["C",2,"rem"],["D",0,"rem"],["D",0,"rem"]],columns=["id","val1","val2"])
id val1 val2
A 0 ret
C 2 rem
B 1 ret
A 0 rem
B 0 rem
D 0 rem
C 2 rem
D 0 rem
D 0 rem
Remove the id group where val1 is 0 in all the rows of group and val2 is rem in all the rows of group. Here for id D, val1 is 0 for all the rows and val2 is rem for all the rows so remove D id.
Expected Output
df_out = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["C",2,"rem"]],columns=["id","val1","val2"])
id val1 val2
A 0 ret
C 2 rem
B 1 ret
A 0 rem
B 0 rem
C 2 rem
How to do it in pandas?
You can use boolean indexing with two masks:
# is there at least one non 0 per group?
m1 = df['val1'].ne(0).groupby(df['id']).transform('any')
# is there at least one non-rem?
m2 = df['val2'].ne('rem').groupby(df['id']).transform('any')
# keep is any is True
out = df[m1|m2]
Output:
id val1 val2
0 A 0 ret
1 C 2 rem
2 B 1 ret
3 A 0 rem
4 B 0 rem
6 C 2 rem
Intermediates:
id val1 val2 m1 m2
0 A 0 ret False True
1 C 2 rem True False
2 B 1 ret True True
3 A 0 rem False True
4 B 0 rem True True
5 D 0 rem False False
6 C 2 rem True False
7 D 0 rem False False
8 D 0 rem False False
You can use a boolean mask for each condition then broadcast the true condition to all group members then invert the mask:
>>> df[~(df['val1'].eq(0) & df['val2'].eq('rem')).groupby(df['id']).transform('all')]
id val1 val2
0 A 0 ret
1 C 2 rem
2 B 1 ret
3 A 0 rem
4 B 0 rem
6 C 2 rem
Another possible solution:
g = df.groupby('id')
pd.concat([x[1] for x in g if
~(x[1]['val1'].eq(0).all() & x[1]['val2'].eq('rem').all())])
Output:
id val1 val2
0 A 0 ret
3 A 0 rem
2 B 1 ret
4 B 0 rem
1 C 2 rem
6 C 2 rem
Solution without groupby with Series.isin
:
df = df[df['id'].isin(df.loc[df['val1'].ne(0) | df['val2'].ne('rem'), 'id'])]
print (df)
id val1 val2
0 A 0 ret
1 C 2 rem
2 B 1 ret
3 A 0 rem
4 B 0 rem
6 C 2 rem