Pandas groupby cumulative sum,mean,median based on the flag values
Question:
I need to calculate cumulative calculations (sum,mean,median,etc) based on the values of Indicator
It should do the calculations for all the false indicator and print it adjacent to the true.
then calculated value should reset and start from true until true appears again for Indicator.
Input data frame
Amount
Indicator
10
False
20
False
5
True
8
False
4
False
6
True
output would be
Amount
Indicator
Sum
Mean
10
False
20
False
5
True
30
15
8
False
4
False
6
True
17
5.6
I tried to use groupby with cumsum() using Indicator as group , but have not had any luck yet.
Answers:
Let’s use cumsum
of Indicator
column as group key to get cumsum and size of Amount
column
m = df['Indicator'].cumsum()
df['Sum'] = df.groupby(m)['Amount'].cumsum().shift().where(df['Indicator'])
df['Mean'] = df['Sum'].shift(-1).div(df.groupby(m)['Amount'].transform('size')).shift()
print(df)
Amount Indicator Sum Mean
0 10 False NaN NaN
1 20 False NaN NaN
2 5 True 30.0 15.000000
3 8 False NaN NaN
4 4 False NaN NaN
5 6 True 17.0 5.666667
In one go using only one groupby:
df[["Sum", "Mean"]] = np.NaN
df.loc[df.Indicator, ["Sum", "Mean"]] = (df.groupby(df.Indicator.cumsum())
.agg({"Amount": ["sum", "mean"]})
.shift(1).iloc[1:].values)
I need to calculate cumulative calculations (sum,mean,median,etc) based on the values of Indicator
It should do the calculations for all the false indicator and print it adjacent to the true.
then calculated value should reset and start from true until true appears again for Indicator.
Input data frame
Amount | Indicator |
---|---|
10 | False |
20 | False |
5 | True |
8 | False |
4 | False |
6 | True |
output would be
Amount | Indicator | Sum | Mean |
---|---|---|---|
10 | False | ||
20 | False | ||
5 | True | 30 | 15 |
8 | False | ||
4 | False | ||
6 | True | 17 | 5.6 |
I tried to use groupby with cumsum() using Indicator as group , but have not had any luck yet.
Let’s use cumsum
of Indicator
column as group key to get cumsum and size of Amount
column
m = df['Indicator'].cumsum()
df['Sum'] = df.groupby(m)['Amount'].cumsum().shift().where(df['Indicator'])
df['Mean'] = df['Sum'].shift(-1).div(df.groupby(m)['Amount'].transform('size')).shift()
print(df)
Amount Indicator Sum Mean
0 10 False NaN NaN
1 20 False NaN NaN
2 5 True 30.0 15.000000
3 8 False NaN NaN
4 4 False NaN NaN
5 6 True 17.0 5.666667
In one go using only one groupby:
df[["Sum", "Mean"]] = np.NaN
df.loc[df.Indicator, ["Sum", "Mean"]] = (df.groupby(df.Indicator.cumsum())
.agg({"Amount": ["sum", "mean"]})
.shift(1).iloc[1:].values)