Python pandas how to sum values by accumulation while zeroing when changing the sign (+,-)
Question:
I have a csv file with some calculations that looks something like this:
Value1
-1
-4
-5
-2
-3
-6
1
7
5
8
2
-1
2
-3
I would like to add a new column to it with a cumulative calculation that takes into account the sign in the Value1 column so that something like this would turn out:
Value1
Value2
-1
-1
-4
-5
-5
-10
-2
-12
-3
-15
-6
-21
1
1
7
8
5
13
8
21
2
23
-1
-1
2
2
-3
-3
That is, for example, while there is a negative value in the Value 1 column, there is an addition in the Value2 column (x + (-x1)) where x is the value Value1 and -x1 is the previous value in the column Value2 and when the sign in the column Value 1 is changed, the calculation begins anew
Is this possible with Python and Pandas?
Answers:
First identify where the rows change sign with a mask. Then groupby them and use cumsum
:
pos_or_neg = df['Value1'] >= 0
groups = pos_or_neg.diff().ne(0).cumsum()
print(groups)
The groups look like this:
0 1
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 2
10 2
11 3
12 4
13 5
Name: Value2, dtype: int32
Finally:
df['Value2'] = df.groupby(groups)['Value1'].cumsum()
print(df)
Output:
Value1 Value2
0 -1 -1
1 -4 -5
2 -5 -10
3 -2 -12
4 -3 -15
5 -6 -21
6 1 1
7 7 8
8 5 13
9 8 21
10 2 23
11 -1 -1
12 2 2
13 -3 -3
You could try as follows:
import pandas as pd
# read your csv
fname = 'mypath/myfile.csv'
df = pd.read_csv(fname)
# pd.Series with False for neg vals / True for pos vals
positives = df.Value1 >= 0
# pd.Series with consecutive count groups
sequences = (positives != positives.shift()).cumsum()
# now perform `groupby` on these groups, and assign to df.Value2
df['Value2'] = df.groupby(sequences)['Value1'].cumsum()
print(df)
Value1 Value2
0 -1 -1
1 -4 -5
2 -5 -10
3 -2 -12
4 -3 -15
5 -6 -21
6 1 1
7 7 8
8 5 13
9 8 21
10 2 23
11 -1 -1
12 2 2
13 -3 -3
# save to csv again
df.to_csv(fname, index=False)
I have a csv file with some calculations that looks something like this:
Value1 |
---|
-1 |
-4 |
-5 |
-2 |
-3 |
-6 |
1 |
7 |
5 |
8 |
2 |
-1 |
2 |
-3 |
I would like to add a new column to it with a cumulative calculation that takes into account the sign in the Value1 column so that something like this would turn out:
Value1 | Value2 |
---|---|
-1 | -1 |
-4 | -5 |
-5 | -10 |
-2 | -12 |
-3 | -15 |
-6 | -21 |
1 | 1 |
7 | 8 |
5 | 13 |
8 | 21 |
2 | 23 |
-1 | -1 |
2 | 2 |
-3 | -3 |
That is, for example, while there is a negative value in the Value 1 column, there is an addition in the Value2 column (x + (-x1)) where x is the value Value1 and -x1 is the previous value in the column Value2 and when the sign in the column Value 1 is changed, the calculation begins anew
Is this possible with Python and Pandas?
First identify where the rows change sign with a mask. Then groupby them and use cumsum
:
pos_or_neg = df['Value1'] >= 0
groups = pos_or_neg.diff().ne(0).cumsum()
print(groups)
The groups look like this:
0 1
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 2
10 2
11 3
12 4
13 5
Name: Value2, dtype: int32
Finally:
df['Value2'] = df.groupby(groups)['Value1'].cumsum()
print(df)
Output:
Value1 Value2
0 -1 -1
1 -4 -5
2 -5 -10
3 -2 -12
4 -3 -15
5 -6 -21
6 1 1
7 7 8
8 5 13
9 8 21
10 2 23
11 -1 -1
12 2 2
13 -3 -3
You could try as follows:
import pandas as pd
# read your csv
fname = 'mypath/myfile.csv'
df = pd.read_csv(fname)
# pd.Series with False for neg vals / True for pos vals
positives = df.Value1 >= 0
# pd.Series with consecutive count groups
sequences = (positives != positives.shift()).cumsum()
# now perform `groupby` on these groups, and assign to df.Value2
df['Value2'] = df.groupby(sequences)['Value1'].cumsum()
print(df)
Value1 Value2
0 -1 -1
1 -4 -5
2 -5 -10
3 -2 -12
4 -3 -15
5 -6 -21
6 1 1
7 7 8
8 5 13
9 8 21
10 2 23
11 -1 -1
12 2 2
13 -3 -3
# save to csv again
df.to_csv(fname, index=False)