While using groupby, set column to value from some row within dataframe
Question:
I have a df setup similar to the one below. Each Date and Ticker combo only has the Bool flip to 1 on one row.
Date
Ticker
High
Low
Bool
2023-02-20
AAPL
146
144
0
2023-02-20
AAPL
143
142
0
2023-02-20
AAPL
144
143
1
2023-02-20
MSFT
146
144
0
2023-02-20
MSFT
143
142
1
2023-02-20
MSFT
144
143
0
2023-02-21
AAPL
146
144
0
2023-02-21
AAPL
143
142
1
2023-02-21
AAPL
144
143
0
I want to create a new column where the entire column is equal to the High where the Bool turns to 1, when grouped by Date and Ticker.
So the output df would look like the following:
Date
Ticker
High
Low
Bool
New_Row
2023-02-20
AAPL
146
144
0
144
2023-02-20
AAPL
143
142
0
144
2023-02-20
AAPL
144
143
1
144
2023-02-20
MSFT
251
248
0
252
2023-02-20
MSFT
252
251
1
252
2023-02-20
MSFT
255
250
0
252
2023-02-21
AAPL
146
144
0
143
2023-02-21
AAPL
143
142
1
143
2023-02-21
AAPL
144
143
0
143
I am unsure of how to do this without just looping through everything manually and I’m thinking that it’s probably not the best way to go about it
Answers:
You may create a new column named High_times_Bool
:
df['High_times_Bool']=df['High']*df['Bool']
Then you may group by Date
, Ticker
and select the max High_times_Bool
for each sub group:
df.groupby(['Date', 'Ticker'])['High_time_Bool'].max()
Finally you may inject the output values back into the original dataframe, following the logic outlined here.
I have a df setup similar to the one below. Each Date and Ticker combo only has the Bool flip to 1 on one row.
Date | Ticker | High | Low | Bool |
---|---|---|---|---|
2023-02-20 | AAPL | 146 | 144 | 0 |
2023-02-20 | AAPL | 143 | 142 | 0 |
2023-02-20 | AAPL | 144 | 143 | 1 |
2023-02-20 | MSFT | 146 | 144 | 0 |
2023-02-20 | MSFT | 143 | 142 | 1 |
2023-02-20 | MSFT | 144 | 143 | 0 |
2023-02-21 | AAPL | 146 | 144 | 0 |
2023-02-21 | AAPL | 143 | 142 | 1 |
2023-02-21 | AAPL | 144 | 143 | 0 |
I want to create a new column where the entire column is equal to the High where the Bool turns to 1, when grouped by Date and Ticker.
So the output df would look like the following:
Date | Ticker | High | Low | Bool | New_Row |
---|---|---|---|---|---|
2023-02-20 | AAPL | 146 | 144 | 0 | 144 |
2023-02-20 | AAPL | 143 | 142 | 0 | 144 |
2023-02-20 | AAPL | 144 | 143 | 1 | 144 |
2023-02-20 | MSFT | 251 | 248 | 0 | 252 |
2023-02-20 | MSFT | 252 | 251 | 1 | 252 |
2023-02-20 | MSFT | 255 | 250 | 0 | 252 |
2023-02-21 | AAPL | 146 | 144 | 0 | 143 |
2023-02-21 | AAPL | 143 | 142 | 1 | 143 |
2023-02-21 | AAPL | 144 | 143 | 0 | 143 |
I am unsure of how to do this without just looping through everything manually and I’m thinking that it’s probably not the best way to go about it
You may create a new column named High_times_Bool
:
df['High_times_Bool']=df['High']*df['Bool']
Then you may group by Date
, Ticker
and select the max High_times_Bool
for each sub group:
df.groupby(['Date', 'Ticker'])['High_time_Bool'].max()
Finally you may inject the output values back into the original dataframe, following the logic outlined here.