Select rows for which condition applies for multiple rows in long format with groupby pandas

Question:

I have a pandas dataframe that looks like this:

import pandas as pd
foo = pd.DataFrame({'id_p': [1,1,2,2,3,3,3,4,4], 
              'id_d_b': [True, True, False, True, True, True,False,False,False], 
              'id_d_i': [False, False, True, False,False,False,True,True,True]})
foo

id_p    id_d_b  id_d_i
0   1   True    False
1   1   True    False
2   2   False   True
3   2   True    False
4   3   True    False
5   3   True    False
6   3   False   True
7   4   False   True
8   4   False   True

I would like to select the id_ps which has have at least one True at id_d_b and at least one True at id_d_i

I tried this

foo['id_d_b'] = foo['id_d_b'].astype(int)
foo['id_d_i'] = foo['id_d_i'].astype(int)

foo['has_id_d_b'] = foo.groupby('id_p')['id_d_b'].transform('max')
foo['has_id_d_i'] = foo.groupby('id_p')['id_d_i'].transform('max')

foo['result'] = foo['has_id_d_b'] + foo['has_id_d_i'] # if this is >1 then that specific id_p has at least one id_d_b and id_d_i
foo['result'] = foo.eval('result > 1')
foo

id_p    id_d_b  id_d_i  has_id_d_b  has_id_d_i  result
0   1   1   0   1   0   False
1   1   1   0   1   0   False
2   2   0   1   1   1   True
3   2   1   0   1   1   True
4   3   1   0   1   1   True
5   3   1   0   1   1   True
6   3   0   1   1   1   True
7   4   0   1   0   1   False
8   4   0   1   0   1   False

which gives the correct result, but I am looking for a "one-liner" solution

Asked By: quant

||

Answers:

You can use: groupby.any to check if any value per group is True, (eventually on a subset of columns), then ensure all columns are True with all. With isin you can get the relevant indices (for slicing or indexing if needed).

m = foo.groupby('id_p').any().all(axis=1)
# if other columns
# m = foo.groupby('id_p')[['id_d_b', 'id_d_i']].any().all(axis=1)

foo['result'] = foo['id_p'].isin(m[m].index)

For the indices:

print(m[m].index)
# Int64Index([2, 3], dtype='int64', name='id_p')

or, using groupby.transform, which will be more expensive:

foo['result'] = foo.groupby('id_p').transform('any').all(axis=1)
# if other columns
# foo['result'] = this foo.groupby('id_p')[['id_d_b', 'id_d_i']].transform('any').all(axis=1)

output:

   id_p  id_d_b  id_d_i  result
0     1    True   False   False
1     1    True   False   False
2     2   False    True    True
3     2    True   False    True
4     3    True   False    True
5     3    True   False    True
6     3   False    True    True
7     4   False    True   False
8     4   False    True   False
Answered By: mozway

Use Series.map with Series created by aggregate GroupBy.any with DataFrame.all:

foo['result'] = foo['id_p'].map(foo.groupby('id_p')[['id_d_b', 'id_d_i']].any().all(axis=1))
print (foo)
   id_p  id_d_b  id_d_i  result
0     1    True   False   False
1     1    True   False   False
2     2   False    True    True
3     2    True   False    True
4     3    True   False    True
5     3    True   False    True
6     3   False    True    True
7     4   False    True   False
8     4   False    True   False
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.