Groupby Date and Class and Cumulative Sum – Pandas

Question:

I have a dataframe like this:

Category  |  Date      |  Val1  |  Val2  |  Val3  |
----------|------------|--------|--------|--------|
  A       | 2022-01-01 |  50    |   70   |   80   |
  B       | 2022-01-01 |  30    |   50   |   20   |
  C       | 2022-01-01 |  40    |   60   |   90   |
  A       | 2022-01-05 |  10    |   40   |   20   |
  B       | 2022-01-05 |  40    |   75   |   80   |
  C       | 2022-01-05 |  60    |   45   |   65   |
  A       | 2022-01-10 |  50    |   40   |   35   |
  B       | 2022-01-10 |  50    |   50   |   25   |
  C       | 2022-01-10 |  50    |   95   |   85   |

What I want to achieve is groupby date and class and take the cumulative sum. So expected output would be like this:

Category  |  Date      |  Val1  |  Val2  |  Val3  |
----------|------------|--------|--------|--------|
  A       | 2022-01-01 |  50    |   70   |   80   |
  B       | 2022-01-01 |  30    |   50   |   20   |
  C       | 2022-01-01 |  40    |   60   |   90   |
  A       | 2022-01-05 |  60    |   110  |   100  |
  B       | 2022-01-05 |  70    |   125  |   100  |
  C       | 2022-01-05 |  100   |   105  |   155  |
  A       | 2022-01-10 |  110   |   150  |   135  |
  B       | 2022-01-10 |  120   |   175  |   125  |
  C       | 2022-01-10 |  150   |   200  |   240  |

All numerical columns are cumsum (Val1, Val2 & Val3)

I tried to take reference from this code but couldn’t make it work in my case.

Asked By: Debadri Dutta

||

Answers:

If not groupby by date, only by Category ouput is:

cols = ['Val1','Val2','Val3']
df[cols] = df.groupby('Category')[cols].cumsum()
print (df)
  Category       Date  Val1  Val2  Val3
0        A 2022-01-01    50    70    80
1        B 2022-01-01    30    50    20
2        C 2022-01-01    40    60    90
3        A 2022-01-05    60   110   100
4        B 2022-01-05    70   125   100
5        C 2022-01-05   100   105   155
6        A 2022-01-10   110   150   135
7        B 2022-01-10   120   175   125
8        C 2022-01-10   150   200   240

If need also groupby per months and Category use:

cols = ['Val1','Val2','Val3']
df[cols] = df.groupby([pd.Grouper(key='Date', freq='M'), 'Category'])[cols].cumsum()
#alternative
#df[cols] = df.groupby([df['Date'].dt.to_period('m'), 'Category'])[cols].cumsum()
print (df)

  Category       Date  Val1  Val2  Val3
0        A 2022-01-01    50    70    80
1        B 2022-01-01    30    50    20
2        C 2022-01-01    40    60    90
3        A 2022-01-05    60   110   100
4        B 2022-01-05    70   125   100
5        C 2022-01-05   100   105   155
6        A 2022-01-10   110   150   135
7        B 2022-01-10   120   175   125
8        C 2022-01-10   150   200   240
Answered By: jezrael

You can use groupby.cumsum (on Category only, it doesn’t make sense to group by date as each group would be only one value so nothing to cumsum) and update the DataFrame:

df.update(df.groupby(['Category']).cumsum())

output

  Category        Date  Val1  Val2  Val3
0        A  2022-01-01    50    70    80
1        B  2022-01-01    30    50    20
2        C  2022-01-01    40    60    90
3        A  2022-01-05    60   110   100
4        B  2022-01-05    70   125   100
5        C  2022-01-05   100   105   155
6        A  2022-01-10   110   150   135
7        B  2022-01-10   120   175   125
8        C  2022-01-10   150   200   240
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.