Pandas: Shift down values by one row within a group

Question:

I have a Pandas dataframe, and I want to create a new column whose values are that of another column, shifted down by one row. The last row should show NaN.

The catch is that I want to do this by group, with the last row of each group showing NaN. NOT have the last row of a group “steal” a value from a group that happens to be adjacent in the dataframe.

My attempted implementation is quite shamefully broken, so I’m clearly misunderstanding something fundamental.

df['B_shifted'] = df.groupby(['A'])['B'].transform(lambda x:x.values[1:])
Asked By: jeffalstott

||

Answers:

Shift works on the output of the groupby clause:

>>> df = pandas.DataFrame(numpy.random.randint(1,3, (10,5)), columns=['a','b','c','d','e'])
>>> df
   a  b  c  d  e
0  2  1  2  1  1
1  2  1  1  1  1
2  1  2  2  1  2
3  1  2  1  1  2
4  2  2  1  1  2
5  2  2  2  2  1
6  2  2  1  1  1
7  2  2  2  1  1
8  2  2  2  2  1
9  2  2  2  2  1


for k, v in df.groupby('a'):
    print k
    print 'normal'
    print v
    print 'shifted'
    print v.shift(1)

1
normal
   a  b  c  d  e
2  1  2  2  1  2
3  1  2  1  1  2
shifted
    a   b   c   d   e
2 NaN NaN NaN NaN NaN
3   1   2   2   1   2
2
normal
   a  b  c  d  e
0  2  1  2  1  1
1  2  1  1  1  1
4  2  2  1  1  2
5  2  2  2  2  1
6  2  2  1  1  1
7  2  2  2  1  1
8  2  2  2  2  1
9  2  2  2  2  1
shifted
    a   b   c   d   e
0 NaN NaN NaN NaN NaN
1   2   1   2   1   1
4   2   1   1   1   1
5   2   2   1   1   2
6   2   2   2   2   1
7   2   2   1   1   1
8   2   2   2   1   1
9   2   2   2   2   1
Answered By: Mike

@EdChum’s comment is a better answer to this question, so I’m posting it here for posterity:

df['B_shifted'] = df.groupby(['A'])['B'].transform(lambda x:x.shift())

or similarly

df['B_shifted'] = df.groupby(['A'])['B'].transform('shift').

The former notation is more flexible, of course (e.g. if you want to shift by 2).

Answered By: abeboparebop

Newer versions of pandas can now perform a shift on a group:

df['B_shifted'] = df.groupby(['A'])['B'].shift(1)

Note that when shifting down, it’s the first row that has NaN.

Answered By: chrisaycock
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.