How do I create a new column that references other row's data for its values?
Question:
I have the following data frame:
Month
Day
Year
Open
High
Low
Close
Week
0
1
1
2003
46.593
46.656
46.405
46.468
1
1
1
2
2003
46.538
46.66
46.47
46.673
1
2
1
3
2003
46.717
46.781
46.53
46.750
1
3
1
4
2003
46.815
46.843
46.68
46.750
1
4
1
5
2003
46.935
47.000
46.56
46.593
1
…
…
…
…
…
…
…
…
…
7257
10
26
2022
381.619
387.5799
381.350
382.019
43
7258
10
27
2022
383.07
385.00
379.329
379.98
43
7259
10
28
2022
379.869
389.519
379.67
389.019
43
7260
10
31
2022
386.44
388.399
385.26
386.209
44
7261
11
1
2022
390.14
390.39
383.29
384.519
44
I want to create a new column titled ‘week high’ which will reference each week every year and pull in the high. So for Week 1, Year 2003, it will take the Highest High from rows 0 to 4 but for Week 43, Year 2022, it will take the Highest High from rows 7257 to 7259.
Is it possible to reference the columns Week and Year to calculate that value? Thanks!
Answers:
I am assuming you are using pandas. Other libraries will work similar.
Create a new DataFrame aggregated per week using groupby
and join it back to your original DataFrame
df_grouped = df["Week", "High"].groupby("Week").max().rename(columns={"High":"Highest High"}
df_result = df.join(df_grouped, "Week")
Assuming pandas, create a weekly period and use it as grouper for transform('max')
:
group = pd.to_datetime(df[['Year', 'Month', 'Day']]).dt.to_period('W')
# or, if you already have a "Week" column
# group = "Week"
df['week_high'] = df.groupby(group)['High'].transform('max')
Output:
Month Day Year Open High Low Close Week week_high
0 1 1 2003 46.593 46.6560 46.405 46.468 1.0 47.000
1 1 2 2003 46.538 46.6600 46.470 46.673 1.0 47.000
2 1 3 2003 46.717 46.7810 46.530 46.750 1.0 47.000
3 1 4 2003 46.815 46.8430 46.680 46.750 1.0 47.000
4 1 5 2003 46.935 47.0000 46.560 46.593 1.0 47.000
7257 10 26 2022 381.619 387.5799 381.350 382.019 43.0 389.519
7258 10 27 2022 383.070 385.0000 379.329 379.980 43.0 389.519
7259 10 28 2022 379.869 389.5190 379.670 389.019 43.0 389.519
7260 10 31 2022 386.440 388.3990 385.260 386.209 44.0 390.390
7261 11 1 2022 390.140 390.3900 383.290 384.519 44 390.390
I have the following data frame:
Month | Day | Year | Open | High | Low | Close | Week | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 2003 | 46.593 | 46.656 | 46.405 | 46.468 | 1 |
1 | 1 | 2 | 2003 | 46.538 | 46.66 | 46.47 | 46.673 | 1 |
2 | 1 | 3 | 2003 | 46.717 | 46.781 | 46.53 | 46.750 | 1 |
3 | 1 | 4 | 2003 | 46.815 | 46.843 | 46.68 | 46.750 | 1 |
4 | 1 | 5 | 2003 | 46.935 | 47.000 | 46.56 | 46.593 | 1 |
… | … | … | … | … | … | … | … | … |
7257 | 10 | 26 | 2022 | 381.619 | 387.5799 | 381.350 | 382.019 | 43 |
7258 | 10 | 27 | 2022 | 383.07 | 385.00 | 379.329 | 379.98 | 43 |
7259 | 10 | 28 | 2022 | 379.869 | 389.519 | 379.67 | 389.019 | 43 |
7260 | 10 | 31 | 2022 | 386.44 | 388.399 | 385.26 | 386.209 | 44 |
7261 | 11 | 1 | 2022 | 390.14 | 390.39 | 383.29 | 384.519 | 44 |
I want to create a new column titled ‘week high’ which will reference each week every year and pull in the high. So for Week 1, Year 2003, it will take the Highest High from rows 0 to 4 but for Week 43, Year 2022, it will take the Highest High from rows 7257 to 7259.
Is it possible to reference the columns Week and Year to calculate that value? Thanks!
I am assuming you are using pandas. Other libraries will work similar.
Create a new DataFrame aggregated per week using groupby
and join it back to your original DataFrame
df_grouped = df["Week", "High"].groupby("Week").max().rename(columns={"High":"Highest High"}
df_result = df.join(df_grouped, "Week")
Assuming pandas, create a weekly period and use it as grouper for transform('max')
:
group = pd.to_datetime(df[['Year', 'Month', 'Day']]).dt.to_period('W')
# or, if you already have a "Week" column
# group = "Week"
df['week_high'] = df.groupby(group)['High'].transform('max')
Output:
Month Day Year Open High Low Close Week week_high
0 1 1 2003 46.593 46.6560 46.405 46.468 1.0 47.000
1 1 2 2003 46.538 46.6600 46.470 46.673 1.0 47.000
2 1 3 2003 46.717 46.7810 46.530 46.750 1.0 47.000
3 1 4 2003 46.815 46.8430 46.680 46.750 1.0 47.000
4 1 5 2003 46.935 47.0000 46.560 46.593 1.0 47.000
7257 10 26 2022 381.619 387.5799 381.350 382.019 43.0 389.519
7258 10 27 2022 383.070 385.0000 379.329 379.980 43.0 389.519
7259 10 28 2022 379.869 389.5190 379.670 389.019 43.0 389.519
7260 10 31 2022 386.440 388.3990 385.260 386.209 44.0 390.390
7261 11 1 2022 390.140 390.3900 383.290 384.519 44 390.390