lag shift a long table in pandas
Question:
I have a pandas dataframe that looks like the following:
ticker, t, shout_t shout_tminus
A 2010-01-01 22
A 2010-01-02 23
A 2010-01-03 24
B 2010-01-01 44
B 2010-01-02 55
B 2010-01-03 66
C 2010-01-01 100
C 2010-01-02 22
C 2010-01-03 33
I want to lag shift this dataframe by 1 day and compute the shout_minus value. ideally, i would have done df.shift(1), but this will be a mistake. ideally i would like:
A 2010-01-01 22 NA
A 2010-01-02 23 22
A 2010-01-03 24 23
for the last value of shout_tminus. Likewise for B and C. I did the following:
ids = ['A','B','C']
df['shoutminus'] = None
for key in ids:
temp = df[df.ticker == key].copy()
temp['shout_tminus'] = temp['shout_t'].shift(1)
df.update(temp)
Problem is if my dataframe is too large, I have a 10million row dataframe, just doing this operation for 1000 tickers takes forever. Is there a faster way to shift a series correctly for a long table df? Thanks
Answers:
All you need is to add a groupby('ticker')
:
df['shout_tminus'] = (
df.sort_values(['ticker', 't'])
.groupby('ticker')
['shout_t']
.shift()
)
Result:
ticker t shout_t shout_tminus
A 2010-01-01 22 NaN
A 2010-01-02 23 22.0
A 2010-01-03 24 23.0
B 2010-01-01 44 NaN
B 2010-01-02 55 44.0
B 2010-01-03 66 55.0
C 2010-01-01 100 NaN
C 2010-01-02 22 100.0
C 2010-01-03 33 22.0
I have a pandas dataframe that looks like the following:
ticker, t, shout_t shout_tminus
A 2010-01-01 22
A 2010-01-02 23
A 2010-01-03 24
B 2010-01-01 44
B 2010-01-02 55
B 2010-01-03 66
C 2010-01-01 100
C 2010-01-02 22
C 2010-01-03 33
I want to lag shift this dataframe by 1 day and compute the shout_minus value. ideally, i would have done df.shift(1), but this will be a mistake. ideally i would like:
A 2010-01-01 22 NA
A 2010-01-02 23 22
A 2010-01-03 24 23
for the last value of shout_tminus. Likewise for B and C. I did the following:
ids = ['A','B','C']
df['shoutminus'] = None
for key in ids:
temp = df[df.ticker == key].copy()
temp['shout_tminus'] = temp['shout_t'].shift(1)
df.update(temp)
Problem is if my dataframe is too large, I have a 10million row dataframe, just doing this operation for 1000 tickers takes forever. Is there a faster way to shift a series correctly for a long table df? Thanks
All you need is to add a groupby('ticker')
:
df['shout_tminus'] = (
df.sort_values(['ticker', 't'])
.groupby('ticker')
['shout_t']
.shift()
)
Result:
ticker t shout_t shout_tminus
A 2010-01-01 22 NaN
A 2010-01-02 23 22.0
A 2010-01-03 24 23.0
B 2010-01-01 44 NaN
B 2010-01-02 55 44.0
B 2010-01-03 66 55.0
C 2010-01-01 100 NaN
C 2010-01-02 22 100.0
C 2010-01-03 33 22.0