Pandas rolling values
Question:
How do I obtain the rolling values of some length n of a pandas series of value ?
For example, if I have the following:
df = pd.DataFrame({'temperature': [0, 1, 2, np.nan, 4, 2, 0.8, 4, 8.8, 7.12]})
how do I obtain the moving values of length n, i.e. something like, if n=3:
[NaN, NaN, 0], [NaN, 0, 1],…, [4, 8.8, 7.12]
EDIT:
If I use pandas rolling, as:
roll = pd.Series.rolling(df, 3).mean()
then roll is the moving averages of the series.
Here, I do not want the averages of every moving set of 3 values, but these sets of 3 values.
Answers:
I think you need first add NaN
s and then this solution:
N = 3
x = np.concatenate([[np.nan] * (N-1), df['temperature'].values])
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
print (rolling_window(x, N))
[[ nan nan 0. ]
[ nan 0. 1. ]
[ 0. 1. 2. ]
[ 1. 2. nan]
[ 2. nan 4. ]
[ nan 4. 2. ]
[ 4. 2. 0.8 ]
[ 2. 0.8 4. ]
[ 0.8 4. 8.8 ]
[ 4. 8.8 7.12]]
Even though the thread is old, maybe it will help someone else. I’m a beginner, but I solved user5805065’s question by following procedure. Maybe, someone can make it more elegant and efficient.
- converting Pandas series to NumPy:
rollTemperature = df['temperature'].values
- then I’ve created numpy array in a for loop with some initial variables:
period = 2
stop = len(rollTemperature)
diffRoll = np.zeros(stop)
for i in range(0,stop):
if i == 0:
diffRoll[i] = np.nan
elif np.mod(i,period)!=0:
diffRoll[i] = np.nan
else:
diffRoll[i] = (rollTemperature[i] + rollTemperature[i-period])/2
- than adding numpy array to existin dataFrame:
df['diffRoll'] = diffRoll
Than the output is:
temperature diffRoll
0 0.00 NaN
1 1.00 NaN
2 2.00 1.0
3 NaN NaN
4 4.00 3.0
5 2.00 NaN
6 0.80 2.4
7 4.00 NaN
8 8.80 4.8
9 7.12 NaN
pd.concat([df1.shift(i) for i in range(3)],axis=1).loc[:,::-1]
.agg(list,axis=1)
0 [nan, nan, 0.0]
1 [nan, 0.0, 1.0]
2 [0.0, 1.0, 2.0]
3 [1.0, 2.0, nan]
4 [2.0, nan, 4.0]
5 [nan, 4.0, 2.0]
6 [4.0, 2.0, 0.8]
7 [2.0, 0.8, 4.0]
8 [0.8, 4.0, 8.8]
9 [4.0, 8.8, 7.12]
dtype: object
How do I obtain the rolling values of some length n of a pandas series of value ?
For example, if I have the following:
df = pd.DataFrame({'temperature': [0, 1, 2, np.nan, 4, 2, 0.8, 4, 8.8, 7.12]})
how do I obtain the moving values of length n, i.e. something like, if n=3:
[NaN, NaN, 0], [NaN, 0, 1],…, [4, 8.8, 7.12]
EDIT:
If I use pandas rolling, as:
roll = pd.Series.rolling(df, 3).mean()
then roll is the moving averages of the series.
Here, I do not want the averages of every moving set of 3 values, but these sets of 3 values.
I think you need first add NaN
s and then this solution:
N = 3
x = np.concatenate([[np.nan] * (N-1), df['temperature'].values])
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
print (rolling_window(x, N))
[[ nan nan 0. ]
[ nan 0. 1. ]
[ 0. 1. 2. ]
[ 1. 2. nan]
[ 2. nan 4. ]
[ nan 4. 2. ]
[ 4. 2. 0.8 ]
[ 2. 0.8 4. ]
[ 0.8 4. 8.8 ]
[ 4. 8.8 7.12]]
Even though the thread is old, maybe it will help someone else. I’m a beginner, but I solved user5805065’s question by following procedure. Maybe, someone can make it more elegant and efficient.
- converting Pandas series to NumPy:
rollTemperature = df['temperature'].values
- then I’ve created numpy array in a for loop with some initial variables:
period = 2
stop = len(rollTemperature)
diffRoll = np.zeros(stop)
for i in range(0,stop):
if i == 0:
diffRoll[i] = np.nan
elif np.mod(i,period)!=0:
diffRoll[i] = np.nan
else:
diffRoll[i] = (rollTemperature[i] + rollTemperature[i-period])/2
- than adding numpy array to existin dataFrame:
df['diffRoll'] = diffRoll
Than the output is:
temperature diffRoll
0 0.00 NaN
1 1.00 NaN
2 2.00 1.0
3 NaN NaN
4 4.00 3.0
5 2.00 NaN
6 0.80 2.4
7 4.00 NaN
8 8.80 4.8
9 7.12 NaN
pd.concat([df1.shift(i) for i in range(3)],axis=1).loc[:,::-1]
.agg(list,axis=1)
0 [nan, nan, 0.0]
1 [nan, 0.0, 1.0]
2 [0.0, 1.0, 2.0]
3 [1.0, 2.0, nan]
4 [2.0, nan, 4.0]
5 [nan, 4.0, 2.0]
6 [4.0, 2.0, 0.8]
7 [2.0, 0.8, 4.0]
8 [0.8, 4.0, 8.8]
9 [4.0, 8.8, 7.12]
dtype: object