Rolling windows in Pandas: How to wrap around with DatetimeIndex?

Question:

I have a DataFrame where the index is a DatetimeIndex with a daily frequency. It contains 365 rows, one for each day of the year. When computing rolling sums, the first few elements are always NaN (as expected), but I’d like them to have actual values.

For example, if using a rolling window of 3 samples, the value for Jan 1 should be the sum of Dec 30, Dec 31, and Jan 1. Similarly, the value for Jan 2 should be the sum of Dec 31, Jan 1, and Jan 2.

I have looked into all parameters of the rolling function in Pandas, and could not find anything that would provide this wrapping. Any help would be appreciated. The code below is a minimal example illustrating the behavior of rolling.

import numpy as np
import pandas as pd

fake_data = pd.DataFrame(index=pd.date_range('2022-1-1', '2022-12-31', freq='D'),
                         data=np.random.random(365))
rolling_fake_data = fake_data.rolling(3).sum()
Asked By: afb

||

Answers:

You can just pass to rolling the min_periods argument. It would look like

import numpy as np
import pandas as pd

fake_data = pd.DataFrame(index=pd.date_range('2022-1-1', '2022-12-31', freq='D'),
                         data=np.random.random(365))
rolling_fake_data = fake_data.rolling(3, min_periods=0).sum()

What will happen is that for the first rows where you don’t have the window with its complete size, i.e. 3, the sum will be performed with the window up to that point, thus first element would be the first element itself, second element would be first+second, and from the third onwards the window would have the full size so would compute as expected.

Answered By: Eduardo Pacheco

You’re basically asking for a circular data object which has no start or end. Not sure that exists!

The best work-around I can think of is to repeat the end of the series before the beginning.

n = 3
rolling_fake_data = (
    pd.concat([fake_data[-n:], fake_data])
).rolling(n).sum()[n:]

# Test
assert(rolling_fake_data.loc["2022-01-01", 0] ==
    fake_data.loc[["2022-12-30", "2022-12-31", "2022-01-01"], 0].sum())
Answered By: Bill

You can use the rolling function with the center parameter set to True. This will ensure that the window is centered around each element, which will include the element itself and the elements before and after it. For example, if you have a window size of 3, the first element will be the sum of the first element and the two preceding it, which will be the last two elements of the previous year in your case.

import pandas as pd
import numpy as np

date_rng = pd.date_range(start='1/1/2020', end='12/31/2020', freq='D')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0,100,size=(len(date_rng)))
df.set_index('date', inplace=True)
df['rolling_sum'] = df['data'].rolling(window=3, center=True).sum()
Answered By: Vaibhav Nellore
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.