Pandas monthly rolling operation

Question:

I ended up figuring it out while writing out this question so I’ll just post anyway and answer my own question in case someone else needs a little help.

Problem

Suppose we have a DataFrame, df, containing this data.

import pandas as pd
from io import StringIO

data = StringIO(
"""
date          spendings  category
2014-03-25    10         A
2014-04-05    20         A
2014-04-15    10         A
2014-04-25    10         B
2014-05-05    10         B
2014-05-15    10         A
2014-05-25    10         A
"""
)

df = pd.read_csv(data,sep="s+",parse_dates=True,index_col="date")

Goal

For each row, sum the spendings over every row that is within one month of it, ideally using DataFrame.rolling as it’s a very clean syntax.

What I have tried

df = df.rolling("M").sum()

But this throws an exception

ValueError: <MonthEnd> is a non-fixed frequency

version: pandas==0.19.2

Asked By: Filip Kilibarda

||

Answers:

Use the "D" offset rather than "M" and specifically use "30D" for 30 days or approximately one month.

df = df.rolling("30D").sum()

Initially, I intuitively jumped to using "M" as I figured it stands for one month, but now it’s clear why that doesn’t work.

Answered By: Filip Kilibarda

To address why you cannot use things like “AS” or “Y”, in this case, “Y” offset is not “a year”, it is actually referencing YearEnd (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases), and therefore the rolling function does not get a fixed window (e.g. you get a 365 day window if your index falls on Jan 1, and 1 day if Dec 31).

The proposed solution (offset by 30D) works if you do not need strict calendar months. Alternatively, you would iterate over your date index, and slice with an offset to get more precise control over your sum.

If you have to do it in one line (separated for readability):

df['Sum'] = [
    df.loc[
        edt - pd.tseries.offsets.DateOffset(months=1):edt, 'spendings'
    ].sum() for edt in df.index
]
spendings   category    Sum
date            
2014-03-25  10  A   10
2014-04-05  20  A   30
2014-04-15  10  A   40
2014-04-25  10  B   50
2014-05-05  10  B   50
2014-05-15  10  A   40
2014-05-25  10  A   40
Answered By: Mike
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.