pandas for loop for running average does not work

Question:

I tried to make a kind of running average – out of 90 rows, every 3 in column A should make an average that would be the same as those rows in column B.
For example:
From this:

df = pd.DataFrame( A B 
                   2 0
                   3 0
                   4 0
                   7 0
                   9 0
                   8 0)

to this:

df = pd.DataFrame( A B 
                   2 3
                   3 3
                   4 3
                   7 8
                   9 8
                   8 8)

I tried running this code:

x=0
for i in df['A']:
  if x<90:
    y = (df['A'][x]+ df['A'][(x +1)]+df['A'][(x +2)])/3
    df['B'][x] = y
    df['B'][(x+1)] = y
    df['B'][(x+2)] = y
    x=x+3
    print(y)

It does print the correct Y
But does not change B

I know there is a better way to do it, and if anyone knows – it would be great if they shared it. But the more important thing for me is to understand why what I wrote down doesn’t have an effect on the df.

Asked By: Niv Reznik

||

Answers:

You could group by the index divided by 3, then use transform to compute the mean of those values and assign to B:

df = pd.DataFrame({'A': [2, 3, 4, 7, 9, 8], 'B': [0, 0, 0, 0, 0, 0]})
df['B'] = df.groupby(df.index // 3)['A'].transform('mean')

Output:

   A  B
0  2  3
1  3  3
2  4  3
3  7  8
4  9  8
5  8  8

Note that this relies on the index being of the form 0,1,2,3,4,.... If that is not the case, you could either reset the index (df.reset_index(drop=True)) or use np.arange(df.shape[0]) instead i.e.

df['B'] = df.groupby(np.arange(df.shape[0]) // 3)['A'].transform('mean')
Answered By: Nick
i = 0
batch_size = 3
df = pd.DataFrame({'A':[2,3,4,7,9,8,9,10],'B':[-1] * 8})
while i < len(df):

  j = min(i+batch_size-1,len(df)-1)
  avg =sum(df.loc[i:j,'A'])/ (j-i+1)
  df.loc[i:j,'B'] = [avg] * (j-i+1)
  i+=batch_size
df

corner case when len(df) % batch_size != 0 assumes we take the average of the leftover rows.

Answered By: Duwang