Subtract two columns and show outcome in the next line
Question:
I would like to get a cumulative sum of X and also subtract X-Y in each column. The remainder should be found and added to the next row. I would also like the count to reset to 0 at the end of every month. For example, the 1st of every month should be a 0 in X.
For example:
DF1:
Date
X
Y
2021-04-25
4
4
2021-04-26
0
0
2021-04-27
0
0
2021-04-28
56
53
2021-04-29
0
0
2021-04-30
1
0
2021-05-01
0
0
2021-05-02
5
0
2021-05-03
5
5
2021-05-04
0
0
Dfdesired:
Date
X
Y
2021-04-25
4
4
2021-04-26
0
0
2021-04-27
0
0
2021-04-28
56
53
2021-04-29
3
0
2021-04-30
4
0
2021-05-01
0
0
2021-05-02
5
0
2021-05-03
10
5
2021-05-04
5
0
I have tried this for the cumulative sum but it does not seem to be working and I am unsure how to reset to 0 at the end of the month.
df1['X'] += df1['X'] - df1['Invoice Rejected']
Answers:
Looks like you want:
df['X'] = df['X'] + (df['X'] - df['Y']).cumsum().shift(1).fillna(0)
Which yields:
Date X Y
0 2021-04-25 4.0 4
1 2021-04-26 0.0 0
2 2021-04-27 0.0 0
3 2021-04-28 56.0 53
4 2021-04-29 3.0 0
5 2021-04-30 4.0 0
6 2021-05-01 4.0 0
7 2021-05-02 9.0 0
8 2021-05-03 14.0 5
9 2021-05-04 9.0 0
Try this to apply the calculation on each month:
df["X"] = (df["X"] +
df.eval("Diff = X - Y").groupby(pd.Grouper(key="Date", freq="M"))["Diff"]
.cumsum().shift().fillna(0))
Output:
Date X Y
0 2021-04-25 4.0 4
1 2021-04-26 0.0 0
2 2021-04-27 0.0 0
3 2021-04-28 56.0 53
4 2021-04-29 3.0 0
5 2021-04-30 4.0 0
6 2021-05-01 4.0 0
7 2021-05-02 5.0 0
8 2021-05-03 10.0 5
9 2021-05-04 5.0 0
Details:
- Create a "Diff" column temporary
- Use pd.Grouper to group on "Date" with a freq="M" and cumsum
- Shift and fillna
- Add back to "X"
I would like to get a cumulative sum of X and also subtract X-Y in each column. The remainder should be found and added to the next row. I would also like the count to reset to 0 at the end of every month. For example, the 1st of every month should be a 0 in X.
For example:
DF1:
Date | X | Y |
---|---|---|
2021-04-25 | 4 | 4 |
2021-04-26 | 0 | 0 |
2021-04-27 | 0 | 0 |
2021-04-28 | 56 | 53 |
2021-04-29 | 0 | 0 |
2021-04-30 | 1 | 0 |
2021-05-01 | 0 | 0 |
2021-05-02 | 5 | 0 |
2021-05-03 | 5 | 5 |
2021-05-04 | 0 | 0 |
Dfdesired:
Date | X | Y |
---|---|---|
2021-04-25 | 4 | 4 |
2021-04-26 | 0 | 0 |
2021-04-27 | 0 | 0 |
2021-04-28 | 56 | 53 |
2021-04-29 | 3 | 0 |
2021-04-30 | 4 | 0 |
2021-05-01 | 0 | 0 |
2021-05-02 | 5 | 0 |
2021-05-03 | 10 | 5 |
2021-05-04 | 5 | 0 |
I have tried this for the cumulative sum but it does not seem to be working and I am unsure how to reset to 0 at the end of the month.
df1['X'] += df1['X'] - df1['Invoice Rejected']
Looks like you want:
df['X'] = df['X'] + (df['X'] - df['Y']).cumsum().shift(1).fillna(0)
Which yields:
Date X Y
0 2021-04-25 4.0 4
1 2021-04-26 0.0 0
2 2021-04-27 0.0 0
3 2021-04-28 56.0 53
4 2021-04-29 3.0 0
5 2021-04-30 4.0 0
6 2021-05-01 4.0 0
7 2021-05-02 9.0 0
8 2021-05-03 14.0 5
9 2021-05-04 9.0 0
Try this to apply the calculation on each month:
df["X"] = (df["X"] +
df.eval("Diff = X - Y").groupby(pd.Grouper(key="Date", freq="M"))["Diff"]
.cumsum().shift().fillna(0))
Output:
Date X Y
0 2021-04-25 4.0 4
1 2021-04-26 0.0 0
2 2021-04-27 0.0 0
3 2021-04-28 56.0 53
4 2021-04-29 3.0 0
5 2021-04-30 4.0 0
6 2021-05-01 4.0 0
7 2021-05-02 5.0 0
8 2021-05-03 10.0 5
9 2021-05-04 5.0 0
Details:
- Create a "Diff" column temporary
- Use pd.Grouper to group on "Date" with a freq="M" and cumsum
- Shift and fillna
- Add back to "X"