Select the dataframe based on multiple conditions on a group like all values in a column are 0 and value = x in another column in pandas

Question:

I have a dataframe

df = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["D",0,"rem"],["C",2,"rem"],["D",0,"rem"],["D",0,"rem"]],columns=["id","val1","val2"])
id val1 val2
A   0   ret
C   2   rem
B   1   ret
A   0   rem
B   0   rem
D   0   rem
C   2   rem
D   0   rem
D   0   rem

Remove the id group where val1 is 0 in all the rows of group and val2 is rem in all the rows of group. Here for id D, val1 is 0 for all the rows and val2 is rem for all the rows so remove D id.

Expected Output

df_out = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["C",2,"rem"]],columns=["id","val1","val2"])
id val1 val2
A   0   ret
C   2   rem
B   1   ret
A   0   rem
B   0   rem
C   2   rem

How to do it in pandas?

Asked By: Chethan

||

Answers:

You can use boolean indexing with two masks:

# is there at least one non 0 per group?
m1 = df['val1'].ne(0).groupby(df['id']).transform('any')
# is there at least one non-rem?
m2 = df['val2'].ne('rem').groupby(df['id']).transform('any')

# keep is any is True
out = df[m1|m2]

Output:

  id  val1 val2
0  A     0  ret
1  C     2  rem
2  B     1  ret
3  A     0  rem
4  B     0  rem
6  C     2  rem

Intermediates:

  id  val1 val2     m1     m2
0  A     0  ret  False   True
1  C     2  rem   True  False
2  B     1  ret   True   True
3  A     0  rem  False   True
4  B     0  rem   True   True
5  D     0  rem  False  False
6  C     2  rem   True  False
7  D     0  rem  False  False
8  D     0  rem  False  False
Answered By: mozway

You can use a boolean mask for each condition then broadcast the true condition to all group members then invert the mask:

>>> df[~(df['val1'].eq(0) & df['val2'].eq('rem')).groupby(df['id']).transform('all')]

  id  val1 val2
0  A     0  ret
1  C     2  rem
2  B     1  ret
3  A     0  rem
4  B     0  rem
6  C     2  rem
Answered By: Corralien

Another possible solution:

g = df.groupby('id')

pd.concat([x[1] for x in g if
           ~(x[1]['val1'].eq(0).all() & x[1]['val2'].eq('rem').all())])

Output:

  id  val1 val2
0  A     0  ret
3  A     0  rem
2  B     1  ret
4  B     0  rem
1  C     2  rem
6  C     2  rem
Answered By: PaulS

Solution without groupby with Series.isin:

df = df[df['id'].isin(df.loc[df['val1'].ne(0) | df['val2'].ne('rem'), 'id'])]
print (df)
  id  val1 val2
0  A     0  ret
1  C     2  rem
2  B     1  ret
3  A     0  rem
4  B     0  rem
6  C     2  rem
Answered By: jezrael