How can I add a stacked growth column to my time series data in pandas?
Question:
I have a table like below, using this dummy data:
data = [['Jane', 10,10.5,11,13,45,41,66,21,88,99,77,84,66,8,77,22,11,44,69,85,36,4,
87,74,56,88,23,6,9,8,55,12,4,58,36,44,89,81,7,98,52,11,45,87,96,32,58,76],
['John',11,22,55,23,6,9,8,41,12,4,58,66,99,36,44,89,81,7,98,52,33,11,45,87,
96,32,58,55,12,4,58,13,36,52,89,81,7,98,34,44,11,25,22,44,55,88,56,47],
['Tom',23,32,43,12,10,10.5,11,46,45,45,46,66,21,88,99,77,8,77,22,11,44,69,8511,44,77,85,99,45,63,
89,81,7,98,3,52,89,81,7,98,13,58,36,44,89,81,7,98,52]]
df = pd.DataFrame(data, columns = ['Name','January-19','February-19','March-19','April-19','May-19','June-19',
'July-19','August-19', 'September-19', 'October-19','November-19', 'December-19',
'January-20','February-20','March-20','April-20','May-20','June-20',
'July-20','August-20', 'September-20', 'October-20','November-20','December-20',
'January-21','February-21','March-21','April-21','May-21','June-21',
'July-21','August-21', 'September-21', 'October-21','November-21',
'December-21','Q1 2019',
'Q2 2019', 'Q3 2019', 'Q4 2019','Q1 2020',
'Q2 2020', 'Q3 2020', 'Q4 2020','Q1 2021',
'Q2 2021', 'Q3 2021', 'Q4 2021'])
This results in the follow dataframe:
Each entry indicates a growth rate.
I need to calculate 2Y and 3Y stacked growth rates using a specific formula for each month and quarter in 2021.
Formula example:
3Y Jan 21 = ((1+Jan 2021 rate) * (1+Jan 2020 rate) * (1+Jan 2019 rate)-1
2Y Jan 21 = ((1+Jan 2021 rate) * (1+Jan 2020 rate))-1
I had initially done the following, which was to transpose the data and try to use the percentage change method in Pandas, indicating a time period of 24 months for the 2 year stacked column:
test1 = df.T
new_header = test1.iloc[0] #grab the first row for the header
test1 = test1[1:] #take the data less the header row
test1.columns = new_header #
test1 = test1.reset_index()
test1.rename(columns={'index': 'Date'}, inplace = True)
test1['Jane_2Y_Growth'] = test1['Jane'].pct_change(24)
test1.tail(15)
However I don’t think its correct because I get a different answer when I manually do the formula.
Does anyone know how to apply this formula to all of my periods in 2021? 12 months and 4 quarters.
So, the 2Y stacked growth column for Q4 2021 for example, would use the Q4 figures from 2021 and 2020.
Thanks
Answers:
Based on the formula example you have given :
3Y Jan 21 = ((1+Jan 2021 rate) * (1+Jan 2020 rate) * (1+Jan 2019 rate)-1
2Y Jan 21 = ((1+Jan 2021 rate) * (1+Jan 2020 rate))-1
Transposing and filtering values of interest from dataframe
NDF = df.T
NDF, NDF.columns = NDF[1:] , NDF.iloc[0]
result_df = NDF.loc[[x for x in NDF.index if '21' in x ]]
Creating placeholder columns for the resultant dataframe
for col in NDF.columns:
result_df[col + '_2Y Growth'] = 0
result_df[col + '_3Y Growth'] = 0
The above gives us
result_df
Name Jane John Tom ... John_3Y Growth Tom_2Y Growth Tom_3Y Growth
January-21 56 96 77 ... 0 0 0
February-21 88 32 85 ... 0 0 0
March-21 23 58 99 ... 0 0 0
April-21 6 55 45 ... 0 0 0
May-21 9 12 63 ... 0 0 0
June-21 8 4 89 ... 0 0 0
July-21 55 58 81 ... 0 0 0
August-21 12 13 7 ... 0 0 0
September-21 4 36 98 ... 0 0 0
October-21 58 52 3 ... 0 0 0
November-21 36 89 52 ... 0 0 0
December-21 44 81 89 ... 0 0 0
Q1 2021 96 55 81 ... 0 0 0
Q2 2021 32 88 7 ... 0 0 0
Q3 2021 58 56 98 ... 0 0 0
Q4 2021 76 47 52 ... 0 0 0
Applying the formula :
for idx in NDF.index:
if('21' in idx):
idx20 = idx[:-2] + '20'
idx19 = idx[:-2] + '19'
for col in NDF.columns:
result_df.loc[idx, col + '_3Y Growth'] = (1 + NDF.loc[idx, col]) * (1 + NDF.loc[idx20, col]) * (1 + NDF.loc[idx19, col]) - 1
result_df.loc[idx, col + '_2Y Growth'] = (1 + NDF.loc[idx, col]) * (1 + NDF.loc[idx20, col]) - 1
Output :
This gives us the expected output based on the formula
Jane
John
Tom
Jane_2Y Growth
Jane_3Y Growth
John_2Y Growth
John_3Y Growth
Tom_2Y Growth
Tom_3Y Growth
January-21
56
96
77
3818
42008
9699
116399
1715
41183
February-21
88
32
85
800
9210.5
1220
28082
7653
252581
March-21
23
58
99
1871
22463
2654
148679
9999
439999
April-21
6
55
45
160
2253
5039
120959
3587
46643
May-21
9
12
63
119
5519
1065
7461
575
6335
June-21
8
4
89
404
17009
39
399
7019
80729
July-21
55
58
81
3919
262639
5840
52568
1885
22631
August-21
12
13
7
1117
24595
741
31163
95
4511
September-21
4
36
98
184
16464
1257
16353
4454
204929
October-21
58
52
3
294
29499
635
3179
279
12879
November-21
36
89
52
3255
253967
4139
244259
451135
21203391
December-21
44
81
89
3374
286874
7215
483471
4049
271349
Q1 2021
96
55
81
5140
462689
671
5375
4837
396715
Q2 2021
32
88
7
395
32471
2313
229085
295
2367
Q3 2021
58
56
98
2713
21711
1310
45884
4454
441044
Q4 2021
76
47
52
6775
670823
2159
97199
4769
66779
EDIT : As OP asked,
Is there a way to add the new columns to the bottom instead and
instead of labelling them ‘Jane_2Y Growth’ for example, label then
‘January-21 2Y growth’. This way they are added to the index and do
not have any names in them, just the date and the growth period.
Instead of adding the result to results
dataframe we can add it to NDF
datafrmae in the appropriate cell. The result would be stored in NDF
for idx in NDF.index:
if('21' in idx):
idx20 = idx[:-2] + '20'
idx19 = idx[:-2] + '19'
for col in NDF.columns:
NDF.loc[idx + ' 3Y Growth', col] = (1 + NDF.loc[idx, col]) * (1 + NDF.loc[idx20, col]) * (1 + NDF.loc[idx19, col]) - 1
NDF.loc[idx + ' 2Y Growth', col] = (1 + NDF.loc[idx, col]) * (1 + NDF.loc[idx20, col]) - 1
This gives us the expected output :
Jane
John
Tom
January-19
10
11
23
February-19
10.5
22
32
March-19
11
55
43
April-19
13
23
12
May-19
45
6
10
June-19
41
9
10.5
July-19
66
8
11
August-19
21
41
46
September-19
88
12
45
October-19
99
4
45
November-19
77
58
46
December-19
84
66
66
January-20
66
99
21
February-20
8
36
88
March-20
77
44
99
April-20
22
89
77
May-20
11
81
8
June-20
44
7
77
July-20
69
98
22
August-20
85
52
11
September-20
36
33
44
October-20
4
11
69
November-20
87
45
8511
December-20
74
87
44
January-21
56
96
77
February-21
88
32
85
March-21
23
58
99
April-21
6
55
45
May-21
9
12
63
June-21
8
4
89
July-21
55
58
81
August-21
12
13
7
September-21
4
36
98
October-21
58
52
3
November-21
36
89
52
December-21
44
81
89
Q1 2019
89
7
81
Q2 2019
81
98
7
Q3 2019
7
34
98
Q4 2019
98
44
13
Q1 2020
52
11
58
Q2 2020
11
25
36
Q3 2020
45
22
44
Q4 2020
87
44
89
Q1 2021
96
55
81
Q2 2021
32
88
7
Q3 2021
58
56
98
Q4 2021
76
47
52
January-21 3Y Growth
42008
116399
41183
January-21 2Y Growth
3818
9699
1715
February-21 3Y Growth
9210.5
28082
252581
February-21 2Y Growth
800
1220
7653
March-21 3Y Growth
22463
148679
439999
March-21 2Y Growth
1871
2654
9999
April-21 3Y Growth
2253
120959
46643
April-21 2Y Growth
160
5039
3587
May-21 3Y Growth
5519
7461
6335
May-21 2Y Growth
119
1065
575
June-21 3Y Growth
17009
399
80729
June-21 2Y Growth
404
39
7019
July-21 3Y Growth
262639
52568
22631
July-21 2Y Growth
3919
5840
1885
August-21 3Y Growth
24595
31163
4511
August-21 2Y Growth
1117
741
95
September-21 3Y Growth
16464
16353
204929
September-21 2Y Growth
184
1257
4454
October-21 3Y Growth
29499
3179
12879
October-21 2Y Growth
294
635
279
November-21 3Y Growth
253967
244259
2.12034e+07
November-21 2Y Growth
3255
4139
451135
December-21 3Y Growth
286874
483471
271349
December-21 2Y Growth
3374
7215
4049
Q1 2021 3Y Growth
462689
5375
396715
Q1 2021 2Y Growth
5140
671
4837
Q2 2021 3Y Growth
32471
229085
2367
Q2 2021 2Y Growth
395
2313
295
Q3 2021 3Y Growth
21711
45884
441044
Q3 2021 2Y Growth
2713
1310
4454
Q4 2021 3Y Growth
670823
97199
66779
Q4 2021 2Y Growth
6775
2159
4769
I have a table like below, using this dummy data:
data = [['Jane', 10,10.5,11,13,45,41,66,21,88,99,77,84,66,8,77,22,11,44,69,85,36,4,
87,74,56,88,23,6,9,8,55,12,4,58,36,44,89,81,7,98,52,11,45,87,96,32,58,76],
['John',11,22,55,23,6,9,8,41,12,4,58,66,99,36,44,89,81,7,98,52,33,11,45,87,
96,32,58,55,12,4,58,13,36,52,89,81,7,98,34,44,11,25,22,44,55,88,56,47],
['Tom',23,32,43,12,10,10.5,11,46,45,45,46,66,21,88,99,77,8,77,22,11,44,69,8511,44,77,85,99,45,63,
89,81,7,98,3,52,89,81,7,98,13,58,36,44,89,81,7,98,52]]
df = pd.DataFrame(data, columns = ['Name','January-19','February-19','March-19','April-19','May-19','June-19',
'July-19','August-19', 'September-19', 'October-19','November-19', 'December-19',
'January-20','February-20','March-20','April-20','May-20','June-20',
'July-20','August-20', 'September-20', 'October-20','November-20','December-20',
'January-21','February-21','March-21','April-21','May-21','June-21',
'July-21','August-21', 'September-21', 'October-21','November-21',
'December-21','Q1 2019',
'Q2 2019', 'Q3 2019', 'Q4 2019','Q1 2020',
'Q2 2020', 'Q3 2020', 'Q4 2020','Q1 2021',
'Q2 2021', 'Q3 2021', 'Q4 2021'])
This results in the follow dataframe:
Each entry indicates a growth rate.
I need to calculate 2Y and 3Y stacked growth rates using a specific formula for each month and quarter in 2021.
Formula example:
3Y Jan 21 = ((1+Jan 2021 rate) * (1+Jan 2020 rate) * (1+Jan 2019 rate)-1
2Y Jan 21 = ((1+Jan 2021 rate) * (1+Jan 2020 rate))-1
I had initially done the following, which was to transpose the data and try to use the percentage change method in Pandas, indicating a time period of 24 months for the 2 year stacked column:
test1 = df.T
new_header = test1.iloc[0] #grab the first row for the header
test1 = test1[1:] #take the data less the header row
test1.columns = new_header #
test1 = test1.reset_index()
test1.rename(columns={'index': 'Date'}, inplace = True)
test1['Jane_2Y_Growth'] = test1['Jane'].pct_change(24)
test1.tail(15)
However I don’t think its correct because I get a different answer when I manually do the formula.
Does anyone know how to apply this formula to all of my periods in 2021? 12 months and 4 quarters.
So, the 2Y stacked growth column for Q4 2021 for example, would use the Q4 figures from 2021 and 2020.
Thanks
Based on the formula example you have given :
3Y Jan 21 = ((1+Jan 2021 rate) * (1+Jan 2020 rate) * (1+Jan 2019 rate)-1
2Y Jan 21 = ((1+Jan 2021 rate) * (1+Jan 2020 rate))-1
Transposing and filtering values of interest from dataframe
NDF = df.T
NDF, NDF.columns = NDF[1:] , NDF.iloc[0]
result_df = NDF.loc[[x for x in NDF.index if '21' in x ]]
Creating placeholder columns for the resultant dataframe
for col in NDF.columns:
result_df[col + '_2Y Growth'] = 0
result_df[col + '_3Y Growth'] = 0
The above gives us
result_df
Name Jane John Tom ... John_3Y Growth Tom_2Y Growth Tom_3Y Growth
January-21 56 96 77 ... 0 0 0
February-21 88 32 85 ... 0 0 0
March-21 23 58 99 ... 0 0 0
April-21 6 55 45 ... 0 0 0
May-21 9 12 63 ... 0 0 0
June-21 8 4 89 ... 0 0 0
July-21 55 58 81 ... 0 0 0
August-21 12 13 7 ... 0 0 0
September-21 4 36 98 ... 0 0 0
October-21 58 52 3 ... 0 0 0
November-21 36 89 52 ... 0 0 0
December-21 44 81 89 ... 0 0 0
Q1 2021 96 55 81 ... 0 0 0
Q2 2021 32 88 7 ... 0 0 0
Q3 2021 58 56 98 ... 0 0 0
Q4 2021 76 47 52 ... 0 0 0
Applying the formula :
for idx in NDF.index:
if('21' in idx):
idx20 = idx[:-2] + '20'
idx19 = idx[:-2] + '19'
for col in NDF.columns:
result_df.loc[idx, col + '_3Y Growth'] = (1 + NDF.loc[idx, col]) * (1 + NDF.loc[idx20, col]) * (1 + NDF.loc[idx19, col]) - 1
result_df.loc[idx, col + '_2Y Growth'] = (1 + NDF.loc[idx, col]) * (1 + NDF.loc[idx20, col]) - 1
Output :
This gives us the expected output based on the formula
Jane | John | Tom | Jane_2Y Growth | Jane_3Y Growth | John_2Y Growth | John_3Y Growth | Tom_2Y Growth | Tom_3Y Growth | |
---|---|---|---|---|---|---|---|---|---|
January-21 | 56 | 96 | 77 | 3818 | 42008 | 9699 | 116399 | 1715 | 41183 |
February-21 | 88 | 32 | 85 | 800 | 9210.5 | 1220 | 28082 | 7653 | 252581 |
March-21 | 23 | 58 | 99 | 1871 | 22463 | 2654 | 148679 | 9999 | 439999 |
April-21 | 6 | 55 | 45 | 160 | 2253 | 5039 | 120959 | 3587 | 46643 |
May-21 | 9 | 12 | 63 | 119 | 5519 | 1065 | 7461 | 575 | 6335 |
June-21 | 8 | 4 | 89 | 404 | 17009 | 39 | 399 | 7019 | 80729 |
July-21 | 55 | 58 | 81 | 3919 | 262639 | 5840 | 52568 | 1885 | 22631 |
August-21 | 12 | 13 | 7 | 1117 | 24595 | 741 | 31163 | 95 | 4511 |
September-21 | 4 | 36 | 98 | 184 | 16464 | 1257 | 16353 | 4454 | 204929 |
October-21 | 58 | 52 | 3 | 294 | 29499 | 635 | 3179 | 279 | 12879 |
November-21 | 36 | 89 | 52 | 3255 | 253967 | 4139 | 244259 | 451135 | 21203391 |
December-21 | 44 | 81 | 89 | 3374 | 286874 | 7215 | 483471 | 4049 | 271349 |
Q1 2021 | 96 | 55 | 81 | 5140 | 462689 | 671 | 5375 | 4837 | 396715 |
Q2 2021 | 32 | 88 | 7 | 395 | 32471 | 2313 | 229085 | 295 | 2367 |
Q3 2021 | 58 | 56 | 98 | 2713 | 21711 | 1310 | 45884 | 4454 | 441044 |
Q4 2021 | 76 | 47 | 52 | 6775 | 670823 | 2159 | 97199 | 4769 | 66779 |
EDIT : As OP asked,
Is there a way to add the new columns to the bottom instead and
instead of labelling them ‘Jane_2Y Growth’ for example, label then
‘January-21 2Y growth’. This way they are added to the index and do
not have any names in them, just the date and the growth period.
Instead of adding the result to results
dataframe we can add it to NDF
datafrmae in the appropriate cell. The result would be stored in NDF
for idx in NDF.index:
if('21' in idx):
idx20 = idx[:-2] + '20'
idx19 = idx[:-2] + '19'
for col in NDF.columns:
NDF.loc[idx + ' 3Y Growth', col] = (1 + NDF.loc[idx, col]) * (1 + NDF.loc[idx20, col]) * (1 + NDF.loc[idx19, col]) - 1
NDF.loc[idx + ' 2Y Growth', col] = (1 + NDF.loc[idx, col]) * (1 + NDF.loc[idx20, col]) - 1
This gives us the expected output :
Jane | John | Tom | |
---|---|---|---|
January-19 | 10 | 11 | 23 |
February-19 | 10.5 | 22 | 32 |
March-19 | 11 | 55 | 43 |
April-19 | 13 | 23 | 12 |
May-19 | 45 | 6 | 10 |
June-19 | 41 | 9 | 10.5 |
July-19 | 66 | 8 | 11 |
August-19 | 21 | 41 | 46 |
September-19 | 88 | 12 | 45 |
October-19 | 99 | 4 | 45 |
November-19 | 77 | 58 | 46 |
December-19 | 84 | 66 | 66 |
January-20 | 66 | 99 | 21 |
February-20 | 8 | 36 | 88 |
March-20 | 77 | 44 | 99 |
April-20 | 22 | 89 | 77 |
May-20 | 11 | 81 | 8 |
June-20 | 44 | 7 | 77 |
July-20 | 69 | 98 | 22 |
August-20 | 85 | 52 | 11 |
September-20 | 36 | 33 | 44 |
October-20 | 4 | 11 | 69 |
November-20 | 87 | 45 | 8511 |
December-20 | 74 | 87 | 44 |
January-21 | 56 | 96 | 77 |
February-21 | 88 | 32 | 85 |
March-21 | 23 | 58 | 99 |
April-21 | 6 | 55 | 45 |
May-21 | 9 | 12 | 63 |
June-21 | 8 | 4 | 89 |
July-21 | 55 | 58 | 81 |
August-21 | 12 | 13 | 7 |
September-21 | 4 | 36 | 98 |
October-21 | 58 | 52 | 3 |
November-21 | 36 | 89 | 52 |
December-21 | 44 | 81 | 89 |
Q1 2019 | 89 | 7 | 81 |
Q2 2019 | 81 | 98 | 7 |
Q3 2019 | 7 | 34 | 98 |
Q4 2019 | 98 | 44 | 13 |
Q1 2020 | 52 | 11 | 58 |
Q2 2020 | 11 | 25 | 36 |
Q3 2020 | 45 | 22 | 44 |
Q4 2020 | 87 | 44 | 89 |
Q1 2021 | 96 | 55 | 81 |
Q2 2021 | 32 | 88 | 7 |
Q3 2021 | 58 | 56 | 98 |
Q4 2021 | 76 | 47 | 52 |
January-21 3Y Growth | 42008 | 116399 | 41183 |
January-21 2Y Growth | 3818 | 9699 | 1715 |
February-21 3Y Growth | 9210.5 | 28082 | 252581 |
February-21 2Y Growth | 800 | 1220 | 7653 |
March-21 3Y Growth | 22463 | 148679 | 439999 |
March-21 2Y Growth | 1871 | 2654 | 9999 |
April-21 3Y Growth | 2253 | 120959 | 46643 |
April-21 2Y Growth | 160 | 5039 | 3587 |
May-21 3Y Growth | 5519 | 7461 | 6335 |
May-21 2Y Growth | 119 | 1065 | 575 |
June-21 3Y Growth | 17009 | 399 | 80729 |
June-21 2Y Growth | 404 | 39 | 7019 |
July-21 3Y Growth | 262639 | 52568 | 22631 |
July-21 2Y Growth | 3919 | 5840 | 1885 |
August-21 3Y Growth | 24595 | 31163 | 4511 |
August-21 2Y Growth | 1117 | 741 | 95 |
September-21 3Y Growth | 16464 | 16353 | 204929 |
September-21 2Y Growth | 184 | 1257 | 4454 |
October-21 3Y Growth | 29499 | 3179 | 12879 |
October-21 2Y Growth | 294 | 635 | 279 |
November-21 3Y Growth | 253967 | 244259 | 2.12034e+07 |
November-21 2Y Growth | 3255 | 4139 | 451135 |
December-21 3Y Growth | 286874 | 483471 | 271349 |
December-21 2Y Growth | 3374 | 7215 | 4049 |
Q1 2021 3Y Growth | 462689 | 5375 | 396715 |
Q1 2021 2Y Growth | 5140 | 671 | 4837 |
Q2 2021 3Y Growth | 32471 | 229085 | 2367 |
Q2 2021 2Y Growth | 395 | 2313 | 295 |
Q3 2021 3Y Growth | 21711 | 45884 | 441044 |
Q3 2021 2Y Growth | 2713 | 1310 | 4454 |
Q4 2021 3Y Growth | 670823 | 97199 | 66779 |
Q4 2021 2Y Growth | 6775 | 2159 | 4769 |