# Rolling difference in Pandas

## Question:

Does anyone know an efficient function/method such as `pandas.rolling_mean`, that would calculate the rolling difference of an array

This is my closest solution:

``````roll_diff = pd.Series(values).diff(periods=1)
``````

However, it only calculates single-step rolling difference. Ideally the step size would be editable (i.e. difference between current time step and n last steps).

I’ve also written this, but for larger arrays, it is quite slow:

``````def roll_diff(values,step):
diff = []
for i in np.arange(step, len(values)-1):
pers_window = np.arange(i-1,i-step-1,-1)
diff.append(np.abs(values[i] - np.mean(values[pers_window])))
diff = np.pad(diff, (0, step+1), 'constant')
return diff
``````

This should work:

``````import numpy as np

x = np.array([1, 3, 6, 1, -5, 6, 4, 1, 6])

def running_diff(arr, N):
return np.array([arr[i] - arr[i-N] for i in range(N, len(arr))])

running_diff(x, 4)  # array([-6,  3, -2,  0, 11])
``````

For a given `pd.Series`, you will have to define what you want for the first few items. The below example just returns the initial series values.

``````s_roll_diff = np.hstack((s.values[:4], running_diff(s.values, 4)))
``````

This works because you can assign a `np.array` directly to a `pd.DataFrame`, e.g. for a column `s`, `df.s_roll_diff = np.hstack((df.s.values[:4], running_diff(df.s.values, 4)))`

``````import pandas

x = pandas.DataFrame({
'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
index=[0, 1, 2, 3, 4, 5, 6, 7])

x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])
``````

in general you can replace the `lambda` function with your own function. Note that in this case the first item will be `NaN`.

# Update

Defining the following:

``````n_steps = 2
def my_fun(x):
return x.iloc[-1] - x.iloc[0]

x['x_1'].rolling(window=n_steps).apply(my_fun)
``````

you can compute the differences between values at `n_steps`.

You can do the same thing as in https://stackoverflow.com/a/48345749/1011724 if you work directly on the underlying numpy array:

``````import numpy as np
diff_kernel = np.array([1,-1])
np.convolve(rs,diff_kernel ,'same')
``````

where `rs` is your pandas series

If you got `KeyError: 0`, try with `iloc`:

``````import pandas

x = pandas.DataFrame({
'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
index=[0, 1, 2, 3, 4, 5, 6, 7])

x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])
``````

Applying `numpy.diff`:

``````import pandas as pd
import numpy as np

x = pd.DataFrame({
'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],}
)

print(x)

>>>   x_1
0    0
1    1
2    2
3    3
4    0
5    1
6    2
7  500

print(x['x_1'].rolling(window=2).apply(np.diff))

>>>0      NaN
1      1.0
2      1.0
3      1.0
4     -3.0
5      1.0
6      1.0
7    498.0
Name: x_1, dtype: float64
``````

If you have unevenly-spaced intervals, or temporal gaps in your data, and you want to use a rolling window of time frequencies, rather than number of periods, you can easily end up in a situation where `x.iloc[-1] - x.iloc[0]` doesn’t return the result you expect. Pandas can construct windows with exactly 1 point, so `x.iloc[-1] == x.iloc[0]` and the diff is always 0.

Sometimes this is the desired outcome, but other times you might want to use the last-known value from before the start of each window.

A general solution (perhaps not so efficient) is to first artificially construct an evenly-spaced series, interpolate or fill data as needed (e.g. using `Series.ffill`), and then use the `.rolling()` techniques described in other answers.

``````# Data with temporal gaps
y = pd.Series(..., index=DatetimeIndex(...))

# Your desired frequency
freq = '1M'

# Construct a new Index with this frequency, using your data ranges
idx_artificial = pd.date_range(y.index.min(), y.index.max(), freq=freq)

# Artificially expand the data to the evenly-spaced index
# New data points will be inserted with null/NaN values
y_artificial = y.reindex(idx_artificial)

# Fill the empty values with last-known value
# This part will vary depending on your needs
y_artificial.ffill(inplace=True)

# Now compute the diffs, using the forward-filled artificially-spaced data
y_diff = y.rolling(freq=freq).apply(lambda x: x.iat[-1] - x.iat[0])
``````

And here are some helper functions to implement the above, for your copy-paste pleasure (warning: lightly-tested code written by a complete stranger, use with caution):

``````def date_range_from_index(index, freq=None, start=None, end=None, **kwargs):
if start is None:
start = index.min()
if end is None:
end = index.max()
if freq is None:
try:
freq = index.freq
except AttributeError:
freq = None
if freq is None:
raise ValueError('Frequency not provided and input has no set frequency.')
return pd.date_range(start, end, freq=freq, **kwargs)

def fill_dtindex(y, freq=None, start=None, end=None, fill=None):
new_index = date_range_from_index(y.index, freq=freq, start=start, end=end)
y = y.reindex(new_index)
if fill is not None:
if isinstance(fill, str):
y = y.fillna(method=fill)
else:
y = y.fillna(fill)
return y
``````
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.