Pandas rolling: aggregate boolean values

Question:

Is there any rolling “any” function in a pandas.DataFrame? Or is there any other way to aggregate boolean values in a rolling function?

Consider:

import pandas as pd
import numpy as np

s = pd.Series([True, True, False, True, False, False, False, True])

# this works but I don't think it is clear enough - I am not
# interested in the sum but a logical or!
s.rolling(2).sum() > 0  

# What I would like to have:
s.rolling(2).any()
# AttributeError: 'Rolling' object has no attribute 'any'
s.rolling(2).agg(np.any)
# Same error! AttributeError: 'Rolling' object has no attribute 'any'

So which functions can I use when aggregating booleans? (if numpy.any does not work)
The rolling documentation at https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.rolling.html states that “a Window or Rolling sub-classed for the particular operation” is returned, which doesn’t really help.

Asked By: Raubtier

||

Answers:

This method is not implemented, close, what you need is use Rolling.apply:

s = s.rolling(2).apply(lambda x: x.any(), raw=False)
print (s)
0    NaN
1    1.0
2    1.0
3    1.0
4    1.0
5    0.0
6    0.0
7    1.0
dtype: float64

s = s.rolling(2).apply(lambda x: x.any(), raw=False).fillna(0).astype(bool)
print (s)
0    False
1     True
2     True
3     True
4     True
5    False
6    False
7     True
dtype: bool

Better here is use strides – generate numpy 2d arrays and processing later:

s = pd.Series([True, True, False, True, False, False, False, True])

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = rolling_window(s.to_numpy(), 2)
print (a)
[[ True  True]
 [ True False]
 [False  True]
 [ True False]
 [False False]
 [False False]
 [False  True]]

print (np.any(a, axis=1))
[ True  True  True  True False False  True]

Here first NaNs pandas values are omitted, you can add first values for processing, here Falses:

n = 2
x = np.concatenate([[False] * (n), s])

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(x, n)
print (a)
[[False False]
 [False  True]
 [ True  True]
 [ True False]
 [False  True]
 [ True False]
 [False False]
 [False False]
 [False  True]]

print (np.any(a, axis=1))
[False  True  True  True  True  True False False  True]
Answered By: jezrael

You aggregate boolean values like this:

# logical or
s.rolling(2).max().astype(bool)

# logical and
s.rolling(2).min().astype(bool)

To deal with the NaN values from incomplete windows, you can use an appropriate fillna before the type conversion, or the min_periods argument of rolling. Depends on the logic you want to implement.

It is a pity this cannot be done in pandas without creating intermediate values as floats.

Answered By: adr
def function1(ss:pd.Series):
    df1.loc[ss.index,'col1']=any(df1.loc[ss.index].col)
    return 0

df1=pd.DataFrame(s,columns=['col'])
df1.assign(col1=df1.index).col.rolling(2).apply(function1)
df1


    col   col1
0   True   True
1   True   True
2  False   True
3   True   True
4  False  False
5  False  False
6  False   True
7   True   True
Answered By: G.G
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.