Python Pandas: Calculate moving average within group

Question:

I have a dataframe containing time series for 100 objects:

object  period  value 
1       1       24
1       2       67
...
1       1000    56
2       1       59
2       2       46
...
2       1000    64
3       1       54
...
100     1       451
100     2       153
...
100     1000    21

I want to calculate moving average with window 10 for the value column. I guess I have to do something like

df.groupby('object').apply(lambda ~calculate MA~) 

and then merge this Series to the original dataframe by object? Can’t figure out exact commands

Asked By: Alexandr Kapshuk

||

Answers:

You can use rolling with transform:

df['moving'] = df.groupby('object')['value'].transform(lambda x: x.rolling(10, 1).mean())

The 1 in rolling is for minimum number of periods.

Answered By: zipa

You can use rolling on groupby object directly as:

df['moving'] = df.groupby('object').rolling(10)['value'].mean()

The new pandas version throws an error when used direct assign to the column so use:

df['moving'] = df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)
Answered By: Space Impact

Extending the answer from @Sandeep Kadapa:

df['moving'] = df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)

The reason for reset_index is because after df.groupby we end up with a Multi Level Index and at the assignment we will get error TypeError: incompatible index of inserted column with frame index

Answered By: dajcs

Create a column as a chain method:

(
    df
        .assign(
            column_name = lambda x: 
                x
                    .groupby(['object'])['value']
                    .transform(lambda x: x.rolling(10)
                    .mean())
        )
)
Answered By: Ramin Melikov

The answers provided may not produce the desired results if you are grouping on multiple columns.

The following should cut it:

df['moving'] = df.groupby(['col_1', 'col_2', 'col_3']).rolling(10)['value'].mean().droplevel(level=[0,1,2])
Answered By: sousben

These solutions assume the dataframe is sorted in a particular way (by object and period). For example, if the data were organized in panels (by period and object), then the assignment will fail. One general solution irrespective of sorting order is the following:

df.loc[:, 'value_sma_10'] = df.groupby(by='object')[['object', 'period']].rolling(window=10, min_periods=1, on='period').mean().reset_index(level='object')['value']
Answered By: mrhee