How to add X value to Y rows iterated
Question:
If the period is X days, I want the slush to be summed over that time period(the time period is not equal to the length of the dataframe for each slush variable).
I want a new column A to show the Slush of that row + the prior slush within an appropriate period, where in this case, period = row.
I have checked other posts, I don’t believe it is Cumsum as the value is not carried over indefinitely and instead is dropped after its period is over.
Data
Slush
Period
85
1
20
3
40
3
15
1
22
4
Desired Output:
Slush
Period
A
85
1
85
20
3
20
40
3
60
15
1
75
22
4
62
Answers:
You can use numpy for a custom cumsum:
a = np.arange(len(df))
mask = a < (a+df['Period'].to_numpy())[:,None]
df['A'] = (np.triu(mask) * df['Slush'].to_numpy()[:,None]).sum(axis=0)
Output:
Slush Period A
0 85 1 85
1 20 3 20
2 40 3 60
3 15 1 75
4 22 4 62
Intermediates:
np.triu(mask)
array([[ True, False, False, False, False],
[False, True, True, True, False],
[False, False, True, True, True],
[False, False, False, True, False],
[False, False, False, False, True]])
(np.triu(mask) * df['Slush'].to_numpy()[:,None])
array([[85, 0, 0, 0, 0],
[ 0, 20, 20, 20, 0],
[ 0, 0, 40, 40, 40],
[ 0, 0, 0, 15, 0],
[ 0, 0, 0, 0, 22]])
If the period is X days, I want the slush to be summed over that time period(the time period is not equal to the length of the dataframe for each slush variable).
I want a new column A to show the Slush of that row + the prior slush within an appropriate period, where in this case, period = row.
I have checked other posts, I don’t believe it is Cumsum as the value is not carried over indefinitely and instead is dropped after its period is over.
Data
Slush | Period |
---|---|
85 | 1 |
20 | 3 |
40 | 3 |
15 | 1 |
22 | 4 |
Desired Output:
Slush | Period | A |
---|---|---|
85 | 1 | 85 |
20 | 3 | 20 |
40 | 3 | 60 |
15 | 1 | 75 |
22 | 4 | 62 |
You can use numpy for a custom cumsum:
a = np.arange(len(df))
mask = a < (a+df['Period'].to_numpy())[:,None]
df['A'] = (np.triu(mask) * df['Slush'].to_numpy()[:,None]).sum(axis=0)
Output:
Slush Period A
0 85 1 85
1 20 3 20
2 40 3 60
3 15 1 75
4 22 4 62
Intermediates:
np.triu(mask)
array([[ True, False, False, False, False],
[False, True, True, True, False],
[False, False, True, True, True],
[False, False, False, True, False],
[False, False, False, False, True]])
(np.triu(mask) * df['Slush'].to_numpy()[:,None])
array([[85, 0, 0, 0, 0],
[ 0, 20, 20, 20, 0],
[ 0, 0, 40, 40, 40],
[ 0, 0, 0, 15, 0],
[ 0, 0, 0, 0, 22]])