lag shift a long table in pandas

Question

I have a pandas dataframe that looks like the following:

ticker,  t,             shout_t     shout_tminus
A        2010-01-01      22
A        2010-01-02      23
A        2010-01-03      24
B        2010-01-01      44
B        2010-01-02      55
B        2010-01-03      66
C        2010-01-01      100
C        2010-01-02      22
C        2010-01-03      33

I want to lag shift this dataframe by 1 day and compute the shout_minus value. ideally, i would have done df.shift(1), but this will be a mistake. ideally i would like:

A      2010-01-01      22     NA
A      2010-01-02      23     22
A      2010-01-03      24     23

for the last value of shout_tminus. Likewise for B and C. I did the following:

ids = ['A','B','C']
df['shoutminus'] = None
for key in ids:
    temp = df[df.ticker == key].copy()
    temp['shout_tminus'] = temp['shout_t'].shift(1)
    df.update(temp)

Problem is if my dataframe is too large, I have a 10million row dataframe, just doing this operation for 1000 tickers takes forever. Is there a faster way to shift a series correctly for a long table df? Thanks

Asked By: turtle_in_mind

||

Source

Answer 1

IICU:
Are you looking for? What will happen though to the last row?

df['shout_tminus']=df.shift().shout_t
df

Answered By: wwnde

Answer 2

All you need is to add a groupby('ticker'):

df['shout_tminus'] = (
   df.sort_values(['ticker', 't'])
   .groupby('ticker')
   ['shout_t']
   .shift()
)

Result:

ticker           t  shout_t  shout_tminus
     A  2010-01-01       22           NaN
     A  2010-01-02       23          22.0
     A  2010-01-03       24          23.0
     B  2010-01-01       44           NaN
     B  2010-01-02       55          44.0
     B  2010-01-03       66          55.0
     C  2010-01-01      100           NaN
     C  2010-01-02       22         100.0
     C  2010-01-03       33          22.0

Answered By: Code Different

lag shift a long table in pandas

Question:

Answers: