How can I filter a Pandas DataFrame based on whether all aggregated values in a column are True?

Question:

I have the following data

data = [
    [1, True],
    [1, True],
    [1, True],
    [1, True],
    [2, True],
    [2, False],
    [2, True],
    [3, True],
    [3, True],
    [3, True],
    [3, True],
    [4, True],
    [4, True],
    [4, False],
    [5, True],
    [5, True],
    [5, True],    
    [5, True],
]

df = pd.DataFrame(data, columns=['ids', 'accept'])

And I would like to filter out all rows whose IDs that have at least one False value in the accept column. So my result should look like (note the missing 2 and 4 IDs):

   ids accept
0   1   True
1   1   True
2   1   True
3   1   True
4   3   True
5   3   True
6   3   True
7   3   True
8   5   True
9   5   True
10  5   True
11  5   True

I was able to get a list of the IDs for which all values in the accept column are True using the groupby() and all() methods:

# Use the groupby() method to group the DataFrame by the 'ids' column
grouped = df.groupby('ids')

# Use the all() method to check whether all values in the 'accept' column are True for each group
accept_all_true = grouped['accept'].all()

But I am stuck at this point. How can I apply this grouping to my original data frame?

Asked By: jda5

||

Answers:

df.groupby('ids').filter(lambda x: x['accept'].sum() == len(x))

Or if you want use all use following code:

df.groupby('ids').filter(lambda x: x['accept'].all() == True)

same result

Answered By: Panda Kim

If performance is important you need avoid DataFrameGroupBy.filter because bad performance in larger DataFrame:

First idea is filter ids for at least one False and then fiter ouput this ids:

df1 = df[~df['ids'].isin(df.loc[~df['accept'],'ids'])]
print (df1)
    ids  accept
0     1    True
1     1    True
2     1    True
3     1    True
7     3    True
8     3    True
9     3    True
10    3    True
14    5    True
15    5    True
16    5    True
17    5    True

Or you can use GroupBy.transform for test if all Trues per groups, performance should be slowier:

df1 = df[df.groupby('ids')['accept'].transform('all').eq(df['accept'])]
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.