Pandas groupby cumulative sum,mean,median based on the flag values

Question:

I need to calculate cumulative calculations (sum,mean,median,etc) based on the values of Indicator

It should do the calculations for all the false indicator and print it adjacent to the true.
then calculated value should reset and start from true until true appears again for Indicator.

Input data frame

Amount Indicator
10 False
20 False
5 True
8 False
4 False
6 True

output would be

Amount Indicator Sum Mean
10 False
20 False
5 True 30 15
8 False
4 False
6 True 17 5.6

I tried to use groupby with cumsum() using Indicator as group , but have not had any luck yet.

Asked By: ALphaCS

||

Answers:

Let’s use cumsum of Indicator column as group key to get cumsum and size of Amount column

m = df['Indicator'].cumsum()

df['Sum'] = df.groupby(m)['Amount'].cumsum().shift().where(df['Indicator'])
df['Mean'] = df['Sum'].shift(-1).div(df.groupby(m)['Amount'].transform('size')).shift()
print(df)

   Amount  Indicator   Sum       Mean
0      10      False   NaN        NaN
1      20      False   NaN        NaN
2       5       True  30.0  15.000000
3       8      False   NaN        NaN
4       4      False   NaN        NaN
5       6       True  17.0   5.666667
Answered By: Ynjxsjmh

In one go using only one groupby:

df[["Sum", "Mean"]] = np.NaN
df.loc[df.Indicator, ["Sum", "Mean"]] = (df.groupby(df.Indicator.cumsum())
                                         .agg({"Amount": ["sum", "mean"]})
                                         .shift(1).iloc[1:].values)
Answered By: bitflip
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.