Python Pandas: Calculate moving average within group
Question:
I have a dataframe containing time series for 100 objects:
object period value
1 1 24
1 2 67
...
1 1000 56
2 1 59
2 2 46
...
2 1000 64
3 1 54
...
100 1 451
100 2 153
...
100 1000 21
I want to calculate moving average with window 10 for the value
column. I guess I have to do something like
df.groupby('object').apply(lambda ~calculate MA~)
and then merge this Series to the original dataframe by object? Can’t figure out exact commands
Answers:
You can use rolling with transform
:
df['moving'] = df.groupby('object')['value'].transform(lambda x: x.rolling(10, 1).mean())
The 1
in rolling
is for minimum number of periods.
You can use rolling
on groupby
object directly as:
df['moving'] = df.groupby('object').rolling(10)['value'].mean()
The new pandas version throws an error when used direct assign to the column so use:
df['moving'] = df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)
Extending the answer from @Sandeep Kadapa:
df['moving'] = df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)
The reason for reset_index
is because after df.groupby
we end up with a Multi Level Index and at the assignment we will get error TypeError: incompatible index of inserted column with frame index
Create a column as a chain method:
(
df
.assign(
column_name = lambda x:
x
.groupby(['object'])['value']
.transform(lambda x: x.rolling(10)
.mean())
)
)
The answers provided may not produce the desired results if you are grouping on multiple columns.
The following should cut it:
df['moving'] = df.groupby(['col_1', 'col_2', 'col_3']).rolling(10)['value'].mean().droplevel(level=[0,1,2])
These solutions assume the dataframe is sorted in a particular way (by object and period). For example, if the data were organized in panels (by period and object), then the assignment will fail. One general solution irrespective of sorting order is the following:
df.loc[:, 'value_sma_10'] = df.groupby(by='object')[['object', 'period']].rolling(window=10, min_periods=1, on='period').mean().reset_index(level='object')['value']
I have a dataframe containing time series for 100 objects:
object period value
1 1 24
1 2 67
...
1 1000 56
2 1 59
2 2 46
...
2 1000 64
3 1 54
...
100 1 451
100 2 153
...
100 1000 21
I want to calculate moving average with window 10 for the value
column. I guess I have to do something like
df.groupby('object').apply(lambda ~calculate MA~)
and then merge this Series to the original dataframe by object? Can’t figure out exact commands
You can use rolling with transform
:
df['moving'] = df.groupby('object')['value'].transform(lambda x: x.rolling(10, 1).mean())
The 1
in rolling
is for minimum number of periods.
You can use rolling
on groupby
object directly as:
df['moving'] = df.groupby('object').rolling(10)['value'].mean()
The new pandas version throws an error when used direct assign to the column so use:
df['moving'] = df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)
Extending the answer from @Sandeep Kadapa:
df['moving'] = df.groupby('object').rolling(10)['value'].mean().reset_index(drop=True)
The reason for reset_index
is because after df.groupby
we end up with a Multi Level Index and at the assignment we will get error TypeError: incompatible index of inserted column with frame index
Create a column as a chain method:
(
df
.assign(
column_name = lambda x:
x
.groupby(['object'])['value']
.transform(lambda x: x.rolling(10)
.mean())
)
)
The answers provided may not produce the desired results if you are grouping on multiple columns.
The following should cut it:
df['moving'] = df.groupby(['col_1', 'col_2', 'col_3']).rolling(10)['value'].mean().droplevel(level=[0,1,2])
These solutions assume the dataframe is sorted in a particular way (by object and period). For example, if the data were organized in panels (by period and object), then the assignment will fail. One general solution irrespective of sorting order is the following:
df.loc[:, 'value_sma_10'] = df.groupby(by='object')[['object', 'period']].rolling(window=10, min_periods=1, on='period').mean().reset_index(level='object')['value']