Is there a way to select interior True values for portions of a DataFrame?


I have a DataFrame that looks like the following:

df = pd.DataFrame({'a':[True]*5+[False]*5+[True]*5,'b':[False]+[True]*3+[False]+[True]*5+[False]*4+[True]})

        a      b
0    True  False
1    True   True
2    True   True
3    True   True
4    True  False
5   False   True
6   False   True
7   False   True
8   False   True
9   False   True
10   True  False
11   True  False
12   True  False
13   True  False
14   True  False

How can I select blocks where column a is True only when the interior values over the same rows for column b are True?

I know that I could find break apart the DataFrame into consecutive True regions, and apply a function to each DataFrame chunk, but this is for a much larger problem with 10 million+ rows, and I don’t think such a solution would scale up very well.

My expected output would be the following:

        a      b      c
0    True  False   True
1    True   True   True
2    True   True   True
3    True   True   True
4    True  False   True
5   False   True  False
6   False   True  False
7   False   True  False
8   False   True  False
9   False   True  False
10   True  False  False
11   True  False  False
12   True  False  False
13   True  False  False
14   True   True  False
Asked By: Derek O



You can do a groupby on the a values and then look at the b values in a function, like this:

groupby_consec_a = df.groupby(df.a.diff().ne(0).cumsum())
all_interior = lambda x: x.iloc[1:-1].all()
df['c'] = df.a & groupby_consec_a.b.transform(all_interior)

Try out whether it’s fast enough on your data. If not, the lambda will have to be replaced by pandas functions, but that will be more code.

Answered By: w-m