Creating cumulative product with reset to 1 at each end of the month

Question:

I try to create a new column CumReturnin a Dataframe dfwith the cumulative product over the month. I try to reset the cum_prod() to 1 at the end of each month (if EndMonth == 1) and start new with the cumulative product.

df:
Date        EndMonth  ID1  Return
2023-01-30     0       A     0.95
2023-01-30     0       B     0.98
2023-01-31     1       A     1.01
2023-01-31     1       B     1.02
2023-02-01     0       A     1.05
2023-02-01     0       B     0.92
2023-02-02     0       A     0.97
2023-02-02     0       B     0.99

I tried it to do with: df[‘CumReturn’] = np.where(df[‘EndMonth’] == 1, 1, df[‘Return’].groupby(‘ID1’).cumprod())

When I do that, I get for 2023-02-02 the cumulative product over the whole period and not only since the start of February.

For reproducability:

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2023-01-30', '2023-01-30', '2023-01-31', '2023-01-31', '2023-02-01', '2023-02-01', '2023-02-02', '2023-02-02'],
    'EndMonth':[0, 0, 1, 1, 0, 0, 0, 0],
    'ID1':['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'Return':[0.95, 0.98, 1.01, 1.02, 1.05, 0.92, 0.97, 0.99]})
df1 = df1.set_index('Date')

Many thanks!

Asked By: fjurt

||

Answers:

Looks like you want:

# set up grouper per ID and month
g = df1.groupby(['ID1', pd.to_datetime(df1.index).month])['Return']

# get cumprod per month with starting value being 1
# (we divide by the first value)
df1['CumProd'] = g.cumprod().div(g.transform('first'))

Output:

            EndMonth ID1  Return  CumProd
Date                                     
2023-01-30         0   A    0.95     1.00
2023-01-30         0   B    0.98     1.00
2023-01-31         1   A    1.01     1.01
2023-01-31         1   B    1.02     1.02
2023-02-01         0   A    1.05     1.00
2023-02-01         0   B    0.92     1.00
2023-02-02         0   A    0.97     0.97
2023-02-02         0   B    0.99     0.99
Answered By: mozway

IIUC use:

df1.index = pd.to_datetime(df1.index)

df1['CumReturn'] = (df1['Return'].add(1)
                                 .groupby([df1['ID1'], pd.Grouper(freq='m')])
                                 .cumprod()
                                 .sub(1))
print (df1)
            EndMonth ID1  Return  CumReturn
Date                                       
2023-01-30         0   A    0.95     0.9500
2023-01-30         0   B    0.98     0.9800
2023-01-31         1   A    1.01     2.9195
2023-01-31         1   B    1.02     2.9996
2023-02-01         0   A    1.05     1.0500
2023-02-01         0   B    0.92     0.9200
2023-02-02         0   A    0.97     3.0385
2023-02-02         0   B    0.99     2.8208
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.