How to do, each row grouping and get previous date's average in python

Question:

df = pd.DataFrame([[11,'b',10,'2020-01-05'],
                   [11,'c',4,'2020-01-02'],
                   [11,'a',6,'2020-01-01'],
                   [22,'c',2,'2020-01-13'],
                   [22,'a',8,'2020-01-05'],
                   [33,'b',2,'2020-01-09'],
                   [33,'d',6,'2020-01-05'],
                   [33,'a',8,'2020-01-01']], columns=['user','lecture','not','date'])

The output will then be:

   userid lecture   note  date
0      11       b     10  2020-01-05
1      11       c      4  2020-01-02
2      11       a      6  2020-01-01
3      22       c      2  2020-01-13
4      22       a      8  2020-01-05
5      33       b      2  2020-01-09
6      33       d      6  2020-01-05
7      33       a      8  2020-01-01

I want to get the average not each user. but it should be the total previous date’s average
the result should be like this;

 userid lecture   note  date          avg
0      11       b     10  2020-01-05  6.666667   ((10+4+6)/3)
1      11       c      4  2020-01-02  5    ((4+6)/2)
2      11       a      6  2020-01-01  6   
3      22       c      2  2020-01-13  5     ((2+8)/2)
4      22       a      8  2020-01-05  8
5      33       b      2  2020-01-09  5.33334   ((2+6+8)/3)
6      33       d      6  2020-01-05  7   ((6+8)/2)
7      33       a      8  2020-01-01  8    

I’m trying some lambda codes. but I couldn’t reach the result

grouped = df.sort_values(['user'], ascending=False).groupby('user',as_index = False).apply(lambda x: x.reset_index(drop = True))
grouped['count'] = grouped.groupby('user').note.transform(lambda x:((x.count()-1)))
grouped['mean'] = grouped.groupby('user').note.transform(lambda x:(x.shift(1).sum()/len(x)))
Asked By: gulnur ozturk

||

Answers:

I have used a for-loop to accomplish the requirement. The use of df.loc[row, col] will specify each cell according to it’s row and column location to do filtering and manipulation.

df['avg'] = ''   #initialize an empty column
for i in range(len(df)):
    temp = df.loc[i:, 'not'][df.loc[i:, 'user']==df.loc[i, 'user']]
    df.loc[i, 'avg'] = sum(temp)/len(temp)

Output df

enter image description here

Answered By: perpetualstudent

Try a reversed expanding mean:

df['avg'] = (
    df.groupby('user')['not']
        .apply(lambda g: g[::-1].expanding().mean())
        .droplevel(0)
)

Or

df['avg'] = (
    df.loc[::-1, 'not'].groupby(df['user']).expanding().mean().droplevel(0)
)

df:

   user lecture  not        date       avg
0    11       b   10  2020-01-05  6.666667
1    11       c    4  2020-01-02  5.000000
2    11       a    6  2020-01-01  6.000000
3    22       c    2  2020-01-13  5.000000
4    22       a    8  2020-01-05  8.000000
5    33       b    2  2020-01-09  5.333333
6    33       d    6  2020-01-05  7.000000
7    33       a    8  2020-01-01  8.000000
Answered By: Henry Ecker
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.