Compute pandas dataframe row difference for each group with incrementing period till preset value

Question:

With a dataframe

df = pd.DataFrame([['a', 3], ['a', 5], ['a', 2], ['a', 6], ['a', 7], ['a', 1], ['a', 9], ['b', 7], ['b', 8], ['b', 11], ['b', 9], ['b', 10], ['b', 6]], columns = ['k', 'v'])

I want to compute the row difference on column v for each group of column k with a period of 3. While for the first few rows in each group that have less than 3 values, we use an incrementing period starting at 1 till the preset value 3. The desired result would be as follows:

enter image description here

What’s a good "pandasonic" way to do this?

Asked By: Rock

||

Answers:

groupby + diff and use fillna

g = df.groupby('k')['v']
g.diff(3).fillna(g.diff(2).fillna(g.diff(1)))
Answered By: Panda Kim

Using .groupby() and .rolling() you can achieve what you want:

>>> grouped = (df.
               groupby('k')['v'].
               rolling(window=4, min_periods=1).
               apply(lambda x: x.iloc[-1] - x.iloc[0]).
               reset_index().
               rename(columns={'v': 'diff'}).drop('level_1', axis=1))

    k   diff
0   a   0.0
1   a   2.0
2   a   -1.0
3   a   3.0
4   a   2.0
5   a   -1.0
6   a   3.0
7   b   0.0
8   b   1.0
9   b   4.0
10  b   2.0
11  b   2.0
12  b   -5.0

By passing in min_periods=1 to .rolling() the window will start at 1 and increment until either the window value is reached or the max possible period is reached — whichever happens first.

You could then use .concat() or .merge() to join this back onto df.

Answered By: David Smith
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.