How to reset a Sum after it reaches a milestone and then continue it in Pandas DataFrame in Python?
Question:
I’ve got one DataFrame that I would like to Sum the "Total_time" column and if it reaches a certain valeu the sum resets. The Sum need to be separated by different "ac"(using groupby) and when it reaches more than 185, the Sum is reseted.
The DataFrame:
index
ac
flight_date
Total_time
0
PR-AKA
2023-03-13 00:00:00
87
1
PR-AKA
2023-03-13 00:00:00
55
2
PR-AKA
2023-03-13 00:00:00
59
3
PR-AKA
2023-03-13 00:00:00
71
4
PR-AKA
2023-03-14 00:00:00
70
5
PR-AKA
2023-03-14 00:00:00
58
6
PR-AKA
2023-03-14 00:00:00
45
7
PR-AKA
2023-03-14 00:00:00
41
8
PR-AKA
2023-03-14 00:00:00
33
9
PR-AKA
2023-03-14 00:00:00
42
10
PR-AKA
2023-03-15 00:00:00
70
11
PR-AKB
2023-03-13 00:00:00
109
12
PR-AKB
2023-03-13 00:00:00
100
13
PR-AKB
2023-03-13 00:00:00
92
14
PR-AKB
2023-03-13 00:00:00
102
15
PR-AKB
2023-03-14 00:00:00
71
16
PR-AKB
2023-03-14 00:00:00
74
17
PR-AKB
2023-03-14 00:00:00
80
18
PR-AKB
2023-03-14 00:00:00
64
19
PR-AKB
2023-03-14 00:00:00
66
20
PR-AKB
2023-03-14 00:00:00
70
21
PR-AKB
2023-03-14 00:00:00
31
22
PR-AKB
2023-03-14 00:00:00
45
23
PR-AKB
2023-03-15 00:00:00
72
24
PR-AKB
2023-03-15 00:00:00
70
25
PR-AKB
2023-03-15 00:00:00
64
26
PR-AKB
2023-03-15 00:00:00
75
27
PR-AKB
2023-03-15 00:00:00
52
28
PR-AKB
2023-03-15 00:00:00
58
29
PR-AKB
2023-03-16 00:00:00
36
30
PR-AKB
2023-03-16 00:00:00
47
31
PR-AKB
2023-03-16 00:00:00
14
32
PR-AKC
2023-03-13 00:00:00
86
33
PR-AKC
2023-03-13 00:00:00
82
34
PR-AKC
2023-03-13 00:00:00
83
Expected result:
index
ac
flight_date
Total_time
Sum
0
PR-AKA
2023-03-13 00:00:00
87
87
1
PR-AKA
2023-03-13 00:00:00
55
142
2
PR-AKA
2023-03-13 00:00:00
59
201
3
PR-AKA
2023-03-13 00:00:00
71
71
4
PR-AKA
2023-03-14 00:00:00
70
141
5
PR-AKA
2023-03-14 00:00:00
58
199
6
PR-AKA
2023-03-14 00:00:00
45
45
7
PR-AKA
2023-03-14 00:00:00
41
86
8
PR-AKA
2023-03-14 00:00:00
33
119
9
PR-AKA
2023-03-14 00:00:00
42
161
10
PR-AKA
2023-03-15 00:00:00
70
231
11
PR-AKB
2023-03-13 00:00:00
109
109
12
PR-AKB
2023-03-13 00:00:00
100
209
13
PR-AKB
2023-03-13 00:00:00
92
92
14
PR-AKB
2023-03-13 00:00:00
102
194
15
PR-AKB
2023-03-14 00:00:00
71
71
16
PR-AKB
2023-03-14 00:00:00
74
145
17
PR-AKB
2023-03-14 00:00:00
80
225
18
PR-AKB
2023-03-14 00:00:00
64
64
19
PR-AKB
2023-03-14 00:00:00
66
131
20
PR-AKB
2023-03-14 00:00:00
70
201
21
PR-AKB
2023-03-14 00:00:00
31
31
22
PR-AKB
2023-03-14 00:00:00
45
76
23
PR-AKB
2023-03-15 00:00:00
72
148
24
PR-AKB
2023-03-15 00:00:00
70
218
25
PR-AKB
2023-03-15 00:00:00
64
64
26
PR-AKB
2023-03-15 00:00:00
75
139
27
PR-AKB
2023-03-15 00:00:00
52
191
28
PR-AKB
2023-03-15 00:00:00
58
58
29
PR-AKB
2023-03-16 00:00:00
36
94
30
PR-AKB
2023-03-16 00:00:00
47
141
31
PR-AKB
2023-03-16 00:00:00
14
155
32
PR-AKC
2023-03-13 00:00:00
86
86
33
PR-AKC
2023-03-13 00:00:00
82
83
34
PR-AKC
2023-03-13 00:00:00
83
166
….
I’ve marked in black where the sum must be reseted.
In one question in here I’ve found one code that almost did it, but instead of reset the sum it put a 0 as bellow but I don’t know how to fix it:
def my_accumulate(maxval):
val = 0
yield
while True:
if val < maxval:
val += yield val
else:
yield val
val = 0
def fn(x):
a = my_accumulate(185)
next(a)
x["TIME_LIMIT"] = [a.send(v) for v in x["Total_time"]]
return x
ATR = ATR.groupby('ac').apply(fn)
ATR
Result:
index
ac
flight_date
Total_time
TIME_LIMIT
0
PR-AKA
2023-03-13 00:00:00
87
0
1
PR-AKA
2023-03-13 00:00:00
55
55
2
PR-AKA
2023-03-13 00:00:00
59
114
3
PR-AKA
2023-03-13 00:00:00
71
185
4
PR-AKA
2023-03-14 00:00:00
70
0
5
PR-AKA
2023-03-14 00:00:00
58
58
6
PR-AKA
2023-03-14 00:00:00
45
103
7
PR-AKA
2023-03-14 00:00:00
41
144
8
PR-AKA
2023-03-14 00:00:00
33
177
9
PR-AKA
2023-03-14 00:00:00
42
219
10
PR-AKA
2023-03-15 00:00:00
70
0
11
PR-AKB
2023-03-13 00:00:00
109
0
12
PR-AKB
2023-03-13 00:00:00
100
100
13
PR-AKB
2023-03-13 00:00:00
92
192
14
PR-AKB
2023-03-13 00:00:00
102
0
15
PR-AKB
2023-03-14 00:00:00
71
71
16
PR-AKB
2023-03-14 00:00:00
74
145
17
PR-AKB
2023-03-14 00:00:00
80
225
18
PR-AKB
2023-03-14 00:00:00
64
0
19
PR-AKB
2023-03-14 00:00:00
66
66
20
PR-AKB
2023-03-14 00:00:00
70
136
21
PR-AKB
2023-03-14 00:00:00
31
167
22
PR-AKB
2023-03-14 00:00:00
45
212
23
PR-AKB
2023-03-15 00:00:00
72
0
24
PR-AKB
2023-03-15 00:00:00
70
70
25
PR-AKB
2023-03-15 00:00:00
64
134
26
PR-AKB
2023-03-15 00:00:00
75
209
27
PR-AKB
2023-03-15 00:00:00
52
0
28
PR-AKB
2023-03-15 00:00:00
58
58
29
PR-AKB
2023-03-16 00:00:00
36
94
30
PR-AKB
2023-03-16 00:00:00
47
141
31
PR-AKB
2023-03-16 00:00:00
14
155
32
PR-AKC
2023-03-13 00:00:00
86
0
33
PR-AKC
2023-03-13 00:00:00
82
82
34
PR-AKC
2023-03-13 00:00:00
83
165
Answers:
You could create your own numpy ufunc
to handle your problem, using np.frompyfunc
import numpy as np
import pandas as pd
df = pd.read_clipboard() # Your df here
f = np.frompyfunc(lambda x, y: x + y if x < 185 else y, nin=2, nout=1)
out = df.groupby("ac")["Total_time"].transform(f.accumulate).astype(int)
out:
0 87
1 142
2 201
3 71
4 141
5 199
6 45
7 86
8 119
9 161
10 231
11 109
12 209
13 92
14 194
15 71
16 145
17 225
18 64
19 130
20 200
21 31
22 76
23 148
24 218
25 64
26 139
27 191
28 58
29 94
30 141
31 155
32 86
33 168
34 251
Name: Total_time, dtype: int32
You can change your generator/function to:
def my_accumulate(s, maxval=float('inf')):
curr = 0
for x in s:
curr += x # add previous value
yield curr
if curr > maxval: # if above threshold
curr = 0 # reset the count
def fn(g):
return list(my_accumulate(g, maxval=185))
ATR['TIME_LIMIT'] = ATR.groupby('ac')['Total_time'].transform(fn)
Output:
index ac flight_date Total_time TIME_LIMIT
0 0 PR-AKA 2023-03-13 00:00:00 87 87
1 1 PR-AKA 2023-03-13 00:00:00 55 142
2 2 PR-AKA 2023-03-13 00:00:00 59 201
3 3 PR-AKA 2023-03-13 00:00:00 71 71
4 4 PR-AKA 2023-03-14 00:00:00 70 141
5 5 PR-AKA 2023-03-14 00:00:00 58 199
6 6 PR-AKA 2023-03-14 00:00:00 45 45
7 7 PR-AKA 2023-03-14 00:00:00 41 86
8 8 PR-AKA 2023-03-14 00:00:00 33 119
9 9 PR-AKA 2023-03-14 00:00:00 42 161
10 10 PR-AKA 2023-03-15 00:00:00 70 231
11 11 PR-AKB 2023-03-13 00:00:00 109 109
12 12 PR-AKB 2023-03-13 00:00:00 100 209
13 13 PR-AKB 2023-03-13 00:00:00 92 92
14 14 PR-AKB 2023-03-13 00:00:00 102 194
15 15 PR-AKB 2023-03-14 00:00:00 71 71
16 16 PR-AKB 2023-03-14 00:00:00 74 145
17 17 PR-AKB 2023-03-14 00:00:00 80 225
18 18 PR-AKB 2023-03-14 00:00:00 64 64
19 19 PR-AKB 2023-03-14 00:00:00 66 130
20 20 PR-AKB 2023-03-14 00:00:00 70 200
21 21 PR-AKB 2023-03-14 00:00:00 31 31
22 22 PR-AKB 2023-03-14 00:00:00 45 76
23 23 PR-AKB 2023-03-15 00:00:00 72 148
24 24 PR-AKB 2023-03-15 00:00:00 70 218
25 25 PR-AKB 2023-03-15 00:00:00 64 64
26 26 PR-AKB 2023-03-15 00:00:00 75 139
27 27 PR-AKB 2023-03-15 00:00:00 52 191
28 28 PR-AKB 2023-03-15 00:00:00 58 58
29 29 PR-AKB 2023-03-16 00:00:00 36 94
30 30 PR-AKB 2023-03-16 00:00:00 47 141
31 31 PR-AKB 2023-03-16 00:00:00 14 155
32 32 PR-AKC 2023-03-13 00:00:00 86 86
33 33 PR-AKC 2023-03-13 00:00:00 82 168
34 34 PR-AKC 2023-03-13 00:00:00 83 251
I’ve got one DataFrame that I would like to Sum the "Total_time" column and if it reaches a certain valeu the sum resets. The Sum need to be separated by different "ac"(using groupby) and when it reaches more than 185, the Sum is reseted.
The DataFrame:
index | ac | flight_date | Total_time |
---|---|---|---|
0 | PR-AKA | 2023-03-13 00:00:00 | 87 |
1 | PR-AKA | 2023-03-13 00:00:00 | 55 |
2 | PR-AKA | 2023-03-13 00:00:00 | 59 |
3 | PR-AKA | 2023-03-13 00:00:00 | 71 |
4 | PR-AKA | 2023-03-14 00:00:00 | 70 |
5 | PR-AKA | 2023-03-14 00:00:00 | 58 |
6 | PR-AKA | 2023-03-14 00:00:00 | 45 |
7 | PR-AKA | 2023-03-14 00:00:00 | 41 |
8 | PR-AKA | 2023-03-14 00:00:00 | 33 |
9 | PR-AKA | 2023-03-14 00:00:00 | 42 |
10 | PR-AKA | 2023-03-15 00:00:00 | 70 |
11 | PR-AKB | 2023-03-13 00:00:00 | 109 |
12 | PR-AKB | 2023-03-13 00:00:00 | 100 |
13 | PR-AKB | 2023-03-13 00:00:00 | 92 |
14 | PR-AKB | 2023-03-13 00:00:00 | 102 |
15 | PR-AKB | 2023-03-14 00:00:00 | 71 |
16 | PR-AKB | 2023-03-14 00:00:00 | 74 |
17 | PR-AKB | 2023-03-14 00:00:00 | 80 |
18 | PR-AKB | 2023-03-14 00:00:00 | 64 |
19 | PR-AKB | 2023-03-14 00:00:00 | 66 |
20 | PR-AKB | 2023-03-14 00:00:00 | 70 |
21 | PR-AKB | 2023-03-14 00:00:00 | 31 |
22 | PR-AKB | 2023-03-14 00:00:00 | 45 |
23 | PR-AKB | 2023-03-15 00:00:00 | 72 |
24 | PR-AKB | 2023-03-15 00:00:00 | 70 |
25 | PR-AKB | 2023-03-15 00:00:00 | 64 |
26 | PR-AKB | 2023-03-15 00:00:00 | 75 |
27 | PR-AKB | 2023-03-15 00:00:00 | 52 |
28 | PR-AKB | 2023-03-15 00:00:00 | 58 |
29 | PR-AKB | 2023-03-16 00:00:00 | 36 |
30 | PR-AKB | 2023-03-16 00:00:00 | 47 |
31 | PR-AKB | 2023-03-16 00:00:00 | 14 |
32 | PR-AKC | 2023-03-13 00:00:00 | 86 |
33 | PR-AKC | 2023-03-13 00:00:00 | 82 |
34 | PR-AKC | 2023-03-13 00:00:00 | 83 |
Expected result:
index | ac | flight_date | Total_time | Sum |
---|---|---|---|---|
0 | PR-AKA | 2023-03-13 00:00:00 | 87 | 87 |
1 | PR-AKA | 2023-03-13 00:00:00 | 55 | 142 |
2 | PR-AKA | 2023-03-13 00:00:00 | 59 | 201 |
3 | PR-AKA | 2023-03-13 00:00:00 | 71 | 71 |
4 | PR-AKA | 2023-03-14 00:00:00 | 70 | 141 |
5 | PR-AKA | 2023-03-14 00:00:00 | 58 | 199 |
6 | PR-AKA | 2023-03-14 00:00:00 | 45 | 45 |
7 | PR-AKA | 2023-03-14 00:00:00 | 41 | 86 |
8 | PR-AKA | 2023-03-14 00:00:00 | 33 | 119 |
9 | PR-AKA | 2023-03-14 00:00:00 | 42 | 161 |
10 | PR-AKA | 2023-03-15 00:00:00 | 70 | 231 |
11 | PR-AKB | 2023-03-13 00:00:00 | 109 | 109 |
12 | PR-AKB | 2023-03-13 00:00:00 | 100 | 209 |
13 | PR-AKB | 2023-03-13 00:00:00 | 92 | 92 |
14 | PR-AKB | 2023-03-13 00:00:00 | 102 | 194 |
15 | PR-AKB | 2023-03-14 00:00:00 | 71 | 71 |
16 | PR-AKB | 2023-03-14 00:00:00 | 74 | 145 |
17 | PR-AKB | 2023-03-14 00:00:00 | 80 | 225 |
18 | PR-AKB | 2023-03-14 00:00:00 | 64 | 64 |
19 | PR-AKB | 2023-03-14 00:00:00 | 66 | 131 |
20 | PR-AKB | 2023-03-14 00:00:00 | 70 | 201 |
21 | PR-AKB | 2023-03-14 00:00:00 | 31 | 31 |
22 | PR-AKB | 2023-03-14 00:00:00 | 45 | 76 |
23 | PR-AKB | 2023-03-15 00:00:00 | 72 | 148 |
24 | PR-AKB | 2023-03-15 00:00:00 | 70 | 218 |
25 | PR-AKB | 2023-03-15 00:00:00 | 64 | 64 |
26 | PR-AKB | 2023-03-15 00:00:00 | 75 | 139 |
27 | PR-AKB | 2023-03-15 00:00:00 | 52 | 191 |
28 | PR-AKB | 2023-03-15 00:00:00 | 58 | 58 |
29 | PR-AKB | 2023-03-16 00:00:00 | 36 | 94 |
30 | PR-AKB | 2023-03-16 00:00:00 | 47 | 141 |
31 | PR-AKB | 2023-03-16 00:00:00 | 14 | 155 |
32 | PR-AKC | 2023-03-13 00:00:00 | 86 | 86 |
33 | PR-AKC | 2023-03-13 00:00:00 | 82 | 83 |
34 | PR-AKC | 2023-03-13 00:00:00 | 83 | 166 |
….
I’ve marked in black where the sum must be reseted.
In one question in here I’ve found one code that almost did it, but instead of reset the sum it put a 0 as bellow but I don’t know how to fix it:
def my_accumulate(maxval):
val = 0
yield
while True:
if val < maxval:
val += yield val
else:
yield val
val = 0
def fn(x):
a = my_accumulate(185)
next(a)
x["TIME_LIMIT"] = [a.send(v) for v in x["Total_time"]]
return x
ATR = ATR.groupby('ac').apply(fn)
ATR
Result:
index | ac | flight_date | Total_time | TIME_LIMIT |
---|---|---|---|---|
0 | PR-AKA | 2023-03-13 00:00:00 | 87 | 0 |
1 | PR-AKA | 2023-03-13 00:00:00 | 55 | 55 |
2 | PR-AKA | 2023-03-13 00:00:00 | 59 | 114 |
3 | PR-AKA | 2023-03-13 00:00:00 | 71 | 185 |
4 | PR-AKA | 2023-03-14 00:00:00 | 70 | 0 |
5 | PR-AKA | 2023-03-14 00:00:00 | 58 | 58 |
6 | PR-AKA | 2023-03-14 00:00:00 | 45 | 103 |
7 | PR-AKA | 2023-03-14 00:00:00 | 41 | 144 |
8 | PR-AKA | 2023-03-14 00:00:00 | 33 | 177 |
9 | PR-AKA | 2023-03-14 00:00:00 | 42 | 219 |
10 | PR-AKA | 2023-03-15 00:00:00 | 70 | 0 |
11 | PR-AKB | 2023-03-13 00:00:00 | 109 | 0 |
12 | PR-AKB | 2023-03-13 00:00:00 | 100 | 100 |
13 | PR-AKB | 2023-03-13 00:00:00 | 92 | 192 |
14 | PR-AKB | 2023-03-13 00:00:00 | 102 | 0 |
15 | PR-AKB | 2023-03-14 00:00:00 | 71 | 71 |
16 | PR-AKB | 2023-03-14 00:00:00 | 74 | 145 |
17 | PR-AKB | 2023-03-14 00:00:00 | 80 | 225 |
18 | PR-AKB | 2023-03-14 00:00:00 | 64 | 0 |
19 | PR-AKB | 2023-03-14 00:00:00 | 66 | 66 |
20 | PR-AKB | 2023-03-14 00:00:00 | 70 | 136 |
21 | PR-AKB | 2023-03-14 00:00:00 | 31 | 167 |
22 | PR-AKB | 2023-03-14 00:00:00 | 45 | 212 |
23 | PR-AKB | 2023-03-15 00:00:00 | 72 | 0 |
24 | PR-AKB | 2023-03-15 00:00:00 | 70 | 70 |
25 | PR-AKB | 2023-03-15 00:00:00 | 64 | 134 |
26 | PR-AKB | 2023-03-15 00:00:00 | 75 | 209 |
27 | PR-AKB | 2023-03-15 00:00:00 | 52 | 0 |
28 | PR-AKB | 2023-03-15 00:00:00 | 58 | 58 |
29 | PR-AKB | 2023-03-16 00:00:00 | 36 | 94 |
30 | PR-AKB | 2023-03-16 00:00:00 | 47 | 141 |
31 | PR-AKB | 2023-03-16 00:00:00 | 14 | 155 |
32 | PR-AKC | 2023-03-13 00:00:00 | 86 | 0 |
33 | PR-AKC | 2023-03-13 00:00:00 | 82 | 82 |
34 | PR-AKC | 2023-03-13 00:00:00 | 83 | 165 |
You could create your own numpy ufunc
to handle your problem, using np.frompyfunc
import numpy as np
import pandas as pd
df = pd.read_clipboard() # Your df here
f = np.frompyfunc(lambda x, y: x + y if x < 185 else y, nin=2, nout=1)
out = df.groupby("ac")["Total_time"].transform(f.accumulate).astype(int)
out:
0 87
1 142
2 201
3 71
4 141
5 199
6 45
7 86
8 119
9 161
10 231
11 109
12 209
13 92
14 194
15 71
16 145
17 225
18 64
19 130
20 200
21 31
22 76
23 148
24 218
25 64
26 139
27 191
28 58
29 94
30 141
31 155
32 86
33 168
34 251
Name: Total_time, dtype: int32
You can change your generator/function to:
def my_accumulate(s, maxval=float('inf')):
curr = 0
for x in s:
curr += x # add previous value
yield curr
if curr > maxval: # if above threshold
curr = 0 # reset the count
def fn(g):
return list(my_accumulate(g, maxval=185))
ATR['TIME_LIMIT'] = ATR.groupby('ac')['Total_time'].transform(fn)
Output:
index ac flight_date Total_time TIME_LIMIT
0 0 PR-AKA 2023-03-13 00:00:00 87 87
1 1 PR-AKA 2023-03-13 00:00:00 55 142
2 2 PR-AKA 2023-03-13 00:00:00 59 201
3 3 PR-AKA 2023-03-13 00:00:00 71 71
4 4 PR-AKA 2023-03-14 00:00:00 70 141
5 5 PR-AKA 2023-03-14 00:00:00 58 199
6 6 PR-AKA 2023-03-14 00:00:00 45 45
7 7 PR-AKA 2023-03-14 00:00:00 41 86
8 8 PR-AKA 2023-03-14 00:00:00 33 119
9 9 PR-AKA 2023-03-14 00:00:00 42 161
10 10 PR-AKA 2023-03-15 00:00:00 70 231
11 11 PR-AKB 2023-03-13 00:00:00 109 109
12 12 PR-AKB 2023-03-13 00:00:00 100 209
13 13 PR-AKB 2023-03-13 00:00:00 92 92
14 14 PR-AKB 2023-03-13 00:00:00 102 194
15 15 PR-AKB 2023-03-14 00:00:00 71 71
16 16 PR-AKB 2023-03-14 00:00:00 74 145
17 17 PR-AKB 2023-03-14 00:00:00 80 225
18 18 PR-AKB 2023-03-14 00:00:00 64 64
19 19 PR-AKB 2023-03-14 00:00:00 66 130
20 20 PR-AKB 2023-03-14 00:00:00 70 200
21 21 PR-AKB 2023-03-14 00:00:00 31 31
22 22 PR-AKB 2023-03-14 00:00:00 45 76
23 23 PR-AKB 2023-03-15 00:00:00 72 148
24 24 PR-AKB 2023-03-15 00:00:00 70 218
25 25 PR-AKB 2023-03-15 00:00:00 64 64
26 26 PR-AKB 2023-03-15 00:00:00 75 139
27 27 PR-AKB 2023-03-15 00:00:00 52 191
28 28 PR-AKB 2023-03-15 00:00:00 58 58
29 29 PR-AKB 2023-03-16 00:00:00 36 94
30 30 PR-AKB 2023-03-16 00:00:00 47 141
31 31 PR-AKB 2023-03-16 00:00:00 14 155
32 32 PR-AKC 2023-03-13 00:00:00 86 86
33 33 PR-AKC 2023-03-13 00:00:00 82 168
34 34 PR-AKC 2023-03-13 00:00:00 83 251