Creating a rolling mean over time in a pivot table

Question:

I aim to make a pivot table with a rolling average over two days. I am using pivot_table() with aggfunc=’mean’ and therefore I get the average on each day. Additionally, I would need to adjust the mean function to also take the previous day into the calculation of the average. Here is some sample data:

df = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-03', '2021-01-03', '2021-01-03'],
    'Name':['Tim', 'Tim', 'Ben', 'Leo', 'Tim', 'Ben', 'Leo', 'Leo', 'Ben', 'Tim'], 
    'Ratings':[9.0, 8.0, 5.0, 3.0, 'NaN', 'NaN', 6, 5, 3, 5]})  

At the moment I only know how to perform a pivot table for the mean on each day which looks like this:

df.pivot_table(
        values='Ratings', index='Date', columns='Name',
        fill_value=0, aggfunc='mean')

But I try to figure out a way to get a moving average of two days which would look like this:

df = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03'],
    'Tim':[8.5, 8.5, 5], 
    'Ben':[5, 5, 3],
    'Leo':['NaN', 4.5, 4.66],})

Thanks a lot for your help 🙂

Asked By: fjurt

||

Answers:

First aggregate sum and counts for MultiIndex, use rolling sum for sum of sums and sum of counts and last for mean use division:

df['Date'] = pd.to_datetime(df['Date'])

df = df.pivot_table(
        values='Ratings', index='Date', columns='Name',
        aggfunc=['sum','count'])

df = df.rolling('2d').sum()
df = df.xs('sum', axis=1, level=0).div(df.xs('count', axis=1, level=0))
print (df)
Name        Ben       Leo  Tim
Date                          
2021-01-01  5.0       NaN  8.5
2021-01-02  5.0  4.500000  8.5
2021-01-03  3.0  4.666667  5.0
Answered By: jezrael
df11=df.loc[df.Ratings!='NaN'].assign(Date=pd.to_datetime(df.Date)).pivot_table(
    values='Ratings', index='Date', columns='Name',
    fill_value=0, aggfunc=['count']).rolling('2d').sum().droplevel(0,axis=1)

df12=df.loc[df.Ratings!='NaN'].assign(Date=pd.to_datetime(df.Date)).pivot_table(
    values='Ratings', index='Date', columns='Name',
    fill_value=0, aggfunc=['sum']).rolling('2d').sum().droplevel(0,axis=1)

df12/df11


Name        Ben       Leo  Tim
Date                          
2021-01-01  5.0       NaN  8.5
2021-01-02  5.0  4.500000  8.5
2021-01-03  3.0  4.666667  5.0
Answered By: G.G
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.