Rolling or sliding window iterator?
Question:
I need a rolling window (aka sliding window) iterable over a sequence/iterator/generator. (Default Python iteration could be considered a special case, where the window length is 1.) I’m currently using the following code. How can I do it more elegantly and/or efficiently?
def rolling_window(seq, window_size):
it = iter(seq)
win = [it.next() for cnt in xrange(window_size)] # First window
yield win
for e in it: # Subsequent windows
win[:-1] = win[1:]
win[-1] = e
yield win
if __name__=="__main__":
for w in rolling_window(xrange(6), 3):
print w
"""Example output:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
"""
For the specific case of window_size == 2
(i.e., iterating over adjacent, overlapping pairs in a sequence), see also How can I iterate over overlapping (current, next) pairs of values from a list?.
Answers:
This seems tailor-made for a collections.deque
since you essentially have a FIFO (add to one end, remove from the other). However, even if you use a list
you shouldn’t be slicing twice; instead, you should probably just pop(0)
from the list and append()
the new item.
Here is an optimized deque-based implementation patterned after your original:
from collections import deque
def window(seq, n=2):
it = iter(seq)
win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
yield win
append = win.append
for e in it:
append(e)
yield win
In my tests it handily beats everything else posted here most of the time, though pillmuncher’s tee
version beats it for large iterables and small windows. On larger windows, the deque
pulls ahead again in raw speed.
Access to individual items in the deque
may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a sum(w)
in the body of my loop; this plays to the deque’s strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher’s). When I changed it to individually look up and add items in a window of ten, the tables turned and the tee
method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but tee
was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.
There’s one in an old version of the Python docs with itertools
examples:
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
The one from the docs is a little more succinct and uses itertools
to greater effect I imagine.
If your iterator is a simple list/tuple a simple way to slide through it with a specified window size would be:
seq = [0, 1, 2, 3, 4, 5]
window_size = 3
for i in range(len(seq) - window_size + 1):
print(seq[i: i + window_size])
Output:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
I like tee()
:
from itertools import tee, izip
def window(iterable, size):
iters = tee(iterable, size)
for i in xrange(1, size):
for each in iters[i:]:
next(each, None)
return izip(*iters)
for each in window(xrange(6), 3):
print list(each)
gives:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
I use the following code as a simple sliding window that uses generators to drastically increase readability. Its speed has so far been sufficient for use in bioinformatics sequence analysis in my experience.
I include it here because I didn’t see this method used yet. Again, I make no claims about its compared performance.
def slidingWindow(sequence,winSize,step=1):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be sliceable."""
# Verify the inputs
if not ((type(winSize) == type(0)) and (type(step) == type(0))):
raise Exception("**ERROR** type(winSize) and type(step) must be int.")
if step > winSize:
raise Exception("**ERROR** step must not be larger than winSize.")
if winSize > len(sequence):
raise Exception("**ERROR** winSize must not be larger than sequence length.")
# Pre-compute number of chunks to emit
numOfChunks = ((len(sequence)-winSize)/step)+1
# Do the work
for i in range(0,numOfChunks*step,step):
yield sequence[i:i+winSize]
Just a quick contribution.
Since the current python docs don’t have “window” in the itertool examples (i.e., at the bottom of http://docs.python.org/library/itertools.html), here’s an snippet based on the
code for grouper which is one of the examples given:
import itertools as it
def window(iterable, size):
shiftedStarts = [it.islice(iterable, s, None) for s in xrange(size)]
return it.izip(*shiftedStarts)
Basically, we create a series of sliced iterators, each with a starting point one spot further forward. Then, we zip these together. Note, this function returns a generator (it is not directly a generator itself).
Much like the appending-element and advancing-iterator versions above, the performance (i.e., which is best) varies with list size and window size. I like this one because it is a two-liner (it could be a one-liner, but I prefer naming concepts).
It turns out that the above code is wrong. It works if the parameter passed to iterable is a sequence but not if it is an iterator. If it is an iterator, the same iterator is shared (but not tee’d) among the islice calls and this breaks things badly.
Here is some fixed code:
import itertools as it
def window(iterable, size):
itrs = it.tee(iterable, size)
shiftedStarts = [it.islice(anItr, s, None) for s, anItr in enumerate(itrs)]
return it.izip(*shiftedStarts)
Also, one more version for the books. Instead of copying an iterator and then advancing copies many times, this version makes pairwise copies of each iterator as we move the starting position forward. Thus, iterator t provides both the “complete” iterator with starting point at t and also the basis for creating iterator t + 1:
import itertools as it
def window4(iterable, size):
complete_itr, incomplete_itr = it.tee(iterable, 2)
iters = [complete_itr]
for i in xrange(1, size):
incomplete_itr.next()
complete_itr, incomplete_itr = it.tee(incomplete_itr, 2)
iters.append(complete_itr)
return it.izip(*iters)
Here’s a generalization that adds support for step
, fillvalue
parameters:
from collections import deque
from itertools import islice
def sliding_window(iterable, size=2, step=1, fillvalue=None):
if size < 0 or step < 1:
raise ValueError
it = iter(iterable)
q = deque(islice(it, size), maxlen=size)
if not q:
return # empty iterable or size == 0
q.extend(fillvalue for _ in range(size - len(q))) # pad to size
while True:
yield iter(q) # iter() to avoid accidental outside modifications
try:
q.append(next(it))
except StopIteration: # Python 3.5 pep 479 support
return
q.extend(next(it, fillvalue) for _ in range(step - 1))
It yields in chunks size
items at a time rolling step
positions per iteration padding each chunk with fillvalue
if necessary. Example for size=4, step=3, fillvalue='*'
:
[a b c d]e f g h i j k l m n o p q r s t u v w x y z
a b c[d e f g]h i j k l m n o p q r s t u v w x y z
a b c d e f[g h i j]k l m n o p q r s t u v w x y z
a b c d e f g h i[j k l m]n o p q r s t u v w x y z
a b c d e f g h i j k l[m n o p]q r s t u v w x y z
a b c d e f g h i j k l m n o[p q r s]t u v w x y z
a b c d e f g h i j k l m n o p q r[s t u v]w x y z
a b c d e f g h i j k l m n o p q r s t u[v w x y]z
a b c d e f g h i j k l m n o p q r s t u v w x[y z * *]
For an example of use case for the step
parameter, see Processing a large .txt file in python efficiently.
a slightly modified version of the deque window, to make it a true rolling window. So that it starts being populated with just one element, then grows to it’s maximum window size, and then shrinks as it’s left edge comes near the end:
from collections import deque
def window(seq, n=2):
it = iter(seq)
win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
yield win
append = win.append
for e in it:
append(e)
yield win
for _ in xrange(len(win)-1):
win.popleft()
yield win
for wnd in window(range(5), n=3):
print(list(wnd))
this gives
[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]
>>> n, m = 6, 3
>>> k = n - m+1
>>> print ('{}n'*(k)).format(*[range(i, i+m) for i in xrange(k)])
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
Multiple iterators!
def window(seq, size, step=1):
# initialize iterators
iters = [iter(seq) for i in range(size)]
# stagger iterators (without yielding)
[next(iters[i]) for j in range(size) for i in range(-1, -j-1, -1)]
while(True):
yield [next(i) for i in iters]
# next line does nothing for step = 1 (skips iterations for step > 1)
[next(i) for i in iters for j in range(step-1)]
next(it)
raises StopIteration
when the sequence is finished, and for some cool reason that’s beyond me, the yield statement here excepts it and the function returns, ignoring the leftover values that don’t form a full window.
Anyway, this is the least-lines solution yet whose only requirement is that seq
implement either __iter__
or __getitem__
and doesn’t rely on itertools
or collections
besides @dansalmo’s solution 🙂
def rolling_window(list, degree):
for i in range(len(list)-degree+1):
yield [list[i+o] for o in range(degree)]
Made this for a rolling average function
why not
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
It is documented in Python doc .
You can easily extend it to wider window.
How about using the following:
mylist = [1, 2, 3, 4, 5, 6, 7]
def sliding_window(l, window_size=2):
if window_size > len(l):
raise ValueError("Window size must be smaller or equal to the number of elements in the list.")
t = []
for i in xrange(0, window_size):
t.append(l[i:])
return zip(*t)
print sliding_window(mylist, 3)
Output:
[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]
def GetShiftingWindows(thelist, size):
return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]
>> a = [1, 2, 3, 4, 5]
>> GetShiftingWindows(a, 3)
[ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]
Just to show how you can combine itertools
recipes, I’m extending the pairwise
recipe as directly as possible back into the window
recipe using the consume
recipe:
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
def window(iterable, n=2):
"s -> (s0, ...,s(n-1)), (s1, ...,sn), (s2, ..., s(n+1)), ..."
iters = tee(iterable, n)
# Could use enumerate(islice(iters, 1, None), 1) to avoid consume(it, 0), but that's
# slower for larger window sizes, while saving only small fixed "noop" cost
for i, it in enumerate(iters):
consume(it, i)
return zip(*iters)
The window
recipe is the same as for pairwise
, it just replaces the single element “consume” on the second tee
-ed iterator with progressively increasing consumes on n - 1
iterators. Using consume
instead of wrapping each iterator in islice
is marginally faster (for sufficiently large iterables) since you only pay the islice
wrapping overhead during the consume
phase, not during the process of extracting each window-ed value (so it’s bounded by n
, not the number of items in iterable
).
Performance-wise, compared to some other solutions, this is pretty good (and better than any of the other solutions I tested as it scales). Tested on Python 3.5.0, Linux x86-64, using ipython
%timeit
magic.
kindall’s the deque
solution, tweaked for performance/correctness by using islice
instead of a home-rolled generator expression and testing the resulting length so it doesn’t yield results when the iterable is shorter than the window, as well as passing the maxlen
of the deque
positionally instead of by keyword (makes a surprising difference for smaller inputs):
>>> %timeit -r5 deque(windowkindall(range(10), 3), 0)
100000 loops, best of 5: 1.87 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 3), 0)
10000 loops, best of 5: 72.6 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 30), 0)
1000 loops, best of 5: 71.6 μs per loop
Same as previous adapted kindall solution, but with each yield win
changed to yield tuple(win)
so storing results from the generator works without all stored results really being a view of the most recent result (all other reasonable solutions are safe in this scenario), and adding tuple=tuple
to the function definition to move use of tuple
from the B
in LEGB
to the L
:
>>> %timeit -r5 deque(windowkindalltupled(range(10), 3), 0)
100000 loops, best of 5: 3.05 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 3), 0)
10000 loops, best of 5: 207 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 30), 0)
1000 loops, best of 5: 348 μs per loop
consume
-based solution shown above:
>>> %timeit -r5 deque(windowconsume(range(10), 3), 0)
100000 loops, best of 5: 3.92 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 3), 0)
10000 loops, best of 5: 42.8 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 30), 0)
1000 loops, best of 5: 232 μs per loop
Same as consume
, but inlining else
case of consume
to avoid function call and n is None
test to reduce runtime, particularly for small inputs where the setup overhead is a meaningful part of the work:
>>> %timeit -r5 deque(windowinlineconsume(range(10), 3), 0)
100000 loops, best of 5: 3.57 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 3), 0)
10000 loops, best of 5: 40.9 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 30), 0)
1000 loops, best of 5: 211 μs per loop
(Side-note: A variant on pairwise
that uses tee
with the default argument of 2 repeatedly to make nested tee
objects, so any given iterator is only advanced once, not independently consumed an increasing number of times, similar to MrDrFenner’s answer is similar to non-inlined consume
and slower than the inlined consume
on all tests, so I’ve omitted it those results for brevity).
As you can see, if you don’t care about the possibility of the caller needing to store results, my optimized version of kindall’s solution wins most of the time, except in the “large iterable, small window size case” (where inlined consume
wins); it degrades quickly as the iterable size increases, while not degrading at all as the window size increases (every other solution degrades more slowly for iterable size increases, but also degrades for window size increases). It can even be adapted for the “need tuples” case by wrapping in map(tuple, ...)
, which runs ever so slightly slower than putting the tupling in the function, but it’s trivial (takes 1-5% longer) and lets you keep the flexibility of running faster when you can tolerate repeatedly returning the same value.
If you need safety against returns being stored, inlined consume
wins on all but the smallest input sizes (with non-inlined consume
being slightly slower but scaling similarly). The deque
& tupling based solution wins only for the smallest inputs, due to smaller setup costs, and the gain is small; it degrades badly as the iterable gets longer.
For the record, the adapted version of kindall’s solution that yield
s tuple
s I used was:
def windowkindalltupled(iterable, n=2, tuple=tuple):
it = iter(iterable)
win = deque(islice(it, n), n)
if len(win) < n:
return
append = win.append
yield tuple(win)
for e in it:
append(e)
yield tuple(win)
Drop the caching of tuple
in the function definition line and the use of tuple
in each yield
to get the faster but less safe version.
This is an old question but for those still interested there is a great implementation of a window slider using generators in this page (by Adrian Rosebrock).
It is an implementation for OpenCV however you can easily use it for any other purpose. For the eager ones i’ll paste the code here but to understand it better I recommend visiting the original page.
def sliding_window(image, stepSize, windowSize):
# slide a window across the image
for y in xrange(0, image.shape[0], stepSize):
for x in xrange(0, image.shape[1], stepSize):
# yield the current window
yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])
Tip: You can check the .shape
of the window when iterating the generator to discard those that do not meet your requirements
Cheers
There is a library which does exactly what you need:
import more_itertools
list(more_itertools.windowed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],n=3, step=3))
Out: [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]
Modified DiPaolo’s answer to allow arbitrary fill and variable step size
import itertools
def window(seq, n=2,step=1,fill=None,keep=0):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(itertools.islice(it, n))
if len(result) == n:
yield result
while True:
# for elem in it:
elem = tuple( next(it, fill) for _ in range(step))
result = result[step:] + elem
if elem[-1] is fill:
if keep:
yield result
break
yield result
#Importing the numpy library
import numpy as np
arr = np.arange(6) #Sequence
window_size = 3
np.lib.stride_tricks.as_strided(arr, shape= (len(arr) - window_size +1, window_size),
strides = arr.strides*2)
"""Example output:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
“””
Let’s make it lazy!
from itertools import islice, tee
def window(iterable, size):
iterators = tee(iterable, size)
iterators = [islice(iterator, i, None) for i, iterator in enumerate(iterators)]
yield from zip(*iterators)
list(window(range(5), 3))
# [(0, 1, 2), (1, 2, 3), (2, 3, 4)]
here is a one liner. I timed it and it’s comprable to the performance of the top answer and gets progressively better with larger seq from 20% slower with len(seq) = 20 and 7% slower with len(seq) = 10000
zip(*[seq[i:(len(seq) - n - 1 + i)] for i in range(n)])
Trying my part, simple, one liner, pythonic way using islice. But, may not be optimally efficient.
from itertools import islice
array = range(0, 10)
window_size = 4
map(lambda i: list(islice(array, i, i + window_size)), range(0, len(array) - window_size + 1))
# output = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8], [6, 7, 8, 9]]
Explanation:
Create window by using islice of window_size and iterate this operation using map over all array.
I tested a few solutions along with the one I came up with. I found the one I came up with to be the fastest so I thought I would share my python3 implementation.
import itertools
import sys
def windowed(l, stride):
return zip(*[itertools.islice(l, i, sys.maxsize) for i in range(stride)])
Optimized Function for sliding window data in Deep learning
def SlidingWindow(X, window_length, stride):
indexer = np.arange(window_length)[None, :] + stride*np.arange(int(len(X)/stride)-window_length+4)[:, None]
return X.take(indexer)
to apply on multidimensional array
import numpy as np
def SlidingWindow(X, window_length, stride1):
stride= X.shape[1]*stride1
window_length = window_length*X.shape[1]
indexer = np.arange(window_length)[None, :] + stride1*np.arange(int(len(X)/stride1)-window_length-1)[:, None]
return X.take(indexer)
my two versions of window
implementation
from typing import Sized, Iterable
def window(seq: Sized, n: int, strid: int = 1, drop_last: bool = False):
for i in range(0, len(seq), strid):
res = seq[i:i + n]
if drop_last and len(res) < n:
break
yield res
def window2(seq: Iterable, n: int, strid: int = 1, drop_last: bool = False):
it = iter(seq)
result = []
step = 0
for i, ele in enumerate(it):
result.append(ele)
result = result[-n:]
if len(result) == n:
if step % strid == 0:
yield result
step += 1
if not drop_last:
yield result
Another simple way to generate window of fixed length from a list
from collections import deque
def window(ls,window_size=3):
window = deque(maxlen=window_size)
for element in ls:
if len(window)==window_size:
yield list(window)
window.append(element)
ls = [0,1,2,3,4,5]
for w in window(ls):
print(w)
My (keep it simple) solution that I ended up using:
def sliding_window(items, size):
return [items[start:end] for start, end
in zip(range(0, len(items) - size + 1), range(size, len(items) + 1))]
Needless to say, the items
sequence needs to be sliceable. Working with indices is not ideal, but it seems to be the least bad option given the alternatives… This can also easily be changed to a generator: just replace [...]
with (...)
.
In Python 3.10, we have the itertools.pairwise(iterable)
function to slide a window with two elements:
Here’s the doc :
Return successive overlapping pairs taken from the input iterable.
The number of 2-tuples in the output iterator will be one fewer than the number of inputs. It will be empty if the input iterable has fewer than two values.
Roughly equivalent to:
def pairwise(iterable):
# pairwise('ABCDEFG') --> AB BC CD DE EF FG
a, b = tee(iterable)
next(b, None)
return zip(a, b)
Update
This is a duplicated answer here, as Kelly find out. But I am leaving this here as a counter-example since I include a pointless min
.
So if you feel tempted to use min to avoid IndexError
don’t, there is no need, range
will handle that case for you.
Old answer
Curiously the following handle automatically when n > len(l)
returning []
which is semantically correct.
>>> l = [0, 1, 2, 3, 4]
>>> n = 2
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1], [1, 2], [2, 3], [3, 4]]
>>>
>>> n = 3
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2], [1, 2, 3], [2, 3, 4]]
>>>
>>> n = 4
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2, 3], [1, 2, 3, 4]]
>>>
>>> n = 5
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2, 3, 4]]
>>>
>>> n = 10 # n > len(l)
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> []
I find this solution more elegant than using built-in functions.
words = ["this", "is", "an", "example"]
def get_sliding_windows(doc, sliding_window, padded=False):
all_windows = []
for i in range(sliding_window):
front = sliding_window-i
all_windows.append(front*['']+doc+i*[''])
if padded:
return np.array(all_windows).transpose()[1:]
else:
return np.array(all_windows).transpose()[sliding_window:-1]
>>> get_sliding_windows(words,3)
>>> array([['this', 'is', 'an'],
['is', 'an', 'example'],
['an', 'example', '']], dtype='<U7')
>>> get_padded_sliding_windows(words,3, True)
>>> array([['', '', 'this'],
['', 'this', 'is'],
['this', 'is', 'an'],
['is', 'an', 'example'],
['an', 'example', ''],
['example', '', '']], dtype='<U7')
I need a rolling window (aka sliding window) iterable over a sequence/iterator/generator. (Default Python iteration could be considered a special case, where the window length is 1.) I’m currently using the following code. How can I do it more elegantly and/or efficiently?
def rolling_window(seq, window_size):
it = iter(seq)
win = [it.next() for cnt in xrange(window_size)] # First window
yield win
for e in it: # Subsequent windows
win[:-1] = win[1:]
win[-1] = e
yield win
if __name__=="__main__":
for w in rolling_window(xrange(6), 3):
print w
"""Example output:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
"""
For the specific case of window_size == 2
(i.e., iterating over adjacent, overlapping pairs in a sequence), see also How can I iterate over overlapping (current, next) pairs of values from a list?.
This seems tailor-made for a collections.deque
since you essentially have a FIFO (add to one end, remove from the other). However, even if you use a list
you shouldn’t be slicing twice; instead, you should probably just pop(0)
from the list and append()
the new item.
Here is an optimized deque-based implementation patterned after your original:
from collections import deque
def window(seq, n=2):
it = iter(seq)
win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
yield win
append = win.append
for e in it:
append(e)
yield win
In my tests it handily beats everything else posted here most of the time, though pillmuncher’s tee
version beats it for large iterables and small windows. On larger windows, the deque
pulls ahead again in raw speed.
Access to individual items in the deque
may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a sum(w)
in the body of my loop; this plays to the deque’s strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher’s). When I changed it to individually look up and add items in a window of ten, the tables turned and the tee
method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but tee
was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.
There’s one in an old version of the Python docs with itertools
examples:
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
The one from the docs is a little more succinct and uses itertools
to greater effect I imagine.
If your iterator is a simple list/tuple a simple way to slide through it with a specified window size would be:
seq = [0, 1, 2, 3, 4, 5]
window_size = 3
for i in range(len(seq) - window_size + 1):
print(seq[i: i + window_size])
Output:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
I like tee()
:
from itertools import tee, izip
def window(iterable, size):
iters = tee(iterable, size)
for i in xrange(1, size):
for each in iters[i:]:
next(each, None)
return izip(*iters)
for each in window(xrange(6), 3):
print list(each)
gives:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
I use the following code as a simple sliding window that uses generators to drastically increase readability. Its speed has so far been sufficient for use in bioinformatics sequence analysis in my experience.
I include it here because I didn’t see this method used yet. Again, I make no claims about its compared performance.
def slidingWindow(sequence,winSize,step=1):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be sliceable."""
# Verify the inputs
if not ((type(winSize) == type(0)) and (type(step) == type(0))):
raise Exception("**ERROR** type(winSize) and type(step) must be int.")
if step > winSize:
raise Exception("**ERROR** step must not be larger than winSize.")
if winSize > len(sequence):
raise Exception("**ERROR** winSize must not be larger than sequence length.")
# Pre-compute number of chunks to emit
numOfChunks = ((len(sequence)-winSize)/step)+1
# Do the work
for i in range(0,numOfChunks*step,step):
yield sequence[i:i+winSize]
Just a quick contribution.
Since the current python docs don’t have “window” in the itertool examples (i.e., at the bottom of http://docs.python.org/library/itertools.html), here’s an snippet based on the
code for grouper which is one of the examples given:
import itertools as it
def window(iterable, size):
shiftedStarts = [it.islice(iterable, s, None) for s in xrange(size)]
return it.izip(*shiftedStarts)
Basically, we create a series of sliced iterators, each with a starting point one spot further forward. Then, we zip these together. Note, this function returns a generator (it is not directly a generator itself).
Much like the appending-element and advancing-iterator versions above, the performance (i.e., which is best) varies with list size and window size. I like this one because it is a two-liner (it could be a one-liner, but I prefer naming concepts).
It turns out that the above code is wrong. It works if the parameter passed to iterable is a sequence but not if it is an iterator. If it is an iterator, the same iterator is shared (but not tee’d) among the islice calls and this breaks things badly.
Here is some fixed code:
import itertools as it
def window(iterable, size):
itrs = it.tee(iterable, size)
shiftedStarts = [it.islice(anItr, s, None) for s, anItr in enumerate(itrs)]
return it.izip(*shiftedStarts)
Also, one more version for the books. Instead of copying an iterator and then advancing copies many times, this version makes pairwise copies of each iterator as we move the starting position forward. Thus, iterator t provides both the “complete” iterator with starting point at t and also the basis for creating iterator t + 1:
import itertools as it
def window4(iterable, size):
complete_itr, incomplete_itr = it.tee(iterable, 2)
iters = [complete_itr]
for i in xrange(1, size):
incomplete_itr.next()
complete_itr, incomplete_itr = it.tee(incomplete_itr, 2)
iters.append(complete_itr)
return it.izip(*iters)
Here’s a generalization that adds support for step
, fillvalue
parameters:
from collections import deque
from itertools import islice
def sliding_window(iterable, size=2, step=1, fillvalue=None):
if size < 0 or step < 1:
raise ValueError
it = iter(iterable)
q = deque(islice(it, size), maxlen=size)
if not q:
return # empty iterable or size == 0
q.extend(fillvalue for _ in range(size - len(q))) # pad to size
while True:
yield iter(q) # iter() to avoid accidental outside modifications
try:
q.append(next(it))
except StopIteration: # Python 3.5 pep 479 support
return
q.extend(next(it, fillvalue) for _ in range(step - 1))
It yields in chunks size
items at a time rolling step
positions per iteration padding each chunk with fillvalue
if necessary. Example for size=4, step=3, fillvalue='*'
:
[a b c d]e f g h i j k l m n o p q r s t u v w x y z
a b c[d e f g]h i j k l m n o p q r s t u v w x y z
a b c d e f[g h i j]k l m n o p q r s t u v w x y z
a b c d e f g h i[j k l m]n o p q r s t u v w x y z
a b c d e f g h i j k l[m n o p]q r s t u v w x y z
a b c d e f g h i j k l m n o[p q r s]t u v w x y z
a b c d e f g h i j k l m n o p q r[s t u v]w x y z
a b c d e f g h i j k l m n o p q r s t u[v w x y]z
a b c d e f g h i j k l m n o p q r s t u v w x[y z * *]
For an example of use case for the step
parameter, see Processing a large .txt file in python efficiently.
a slightly modified version of the deque window, to make it a true rolling window. So that it starts being populated with just one element, then grows to it’s maximum window size, and then shrinks as it’s left edge comes near the end:
from collections import deque
def window(seq, n=2):
it = iter(seq)
win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
yield win
append = win.append
for e in it:
append(e)
yield win
for _ in xrange(len(win)-1):
win.popleft()
yield win
for wnd in window(range(5), n=3):
print(list(wnd))
this gives
[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]
>>> n, m = 6, 3
>>> k = n - m+1
>>> print ('{}n'*(k)).format(*[range(i, i+m) for i in xrange(k)])
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
Multiple iterators!
def window(seq, size, step=1):
# initialize iterators
iters = [iter(seq) for i in range(size)]
# stagger iterators (without yielding)
[next(iters[i]) for j in range(size) for i in range(-1, -j-1, -1)]
while(True):
yield [next(i) for i in iters]
# next line does nothing for step = 1 (skips iterations for step > 1)
[next(i) for i in iters for j in range(step-1)]
next(it)
raises StopIteration
when the sequence is finished, and for some cool reason that’s beyond me, the yield statement here excepts it and the function returns, ignoring the leftover values that don’t form a full window.
Anyway, this is the least-lines solution yet whose only requirement is that seq
implement either __iter__
or __getitem__
and doesn’t rely on itertools
or collections
besides @dansalmo’s solution 🙂
def rolling_window(list, degree):
for i in range(len(list)-degree+1):
yield [list[i+o] for o in range(degree)]
Made this for a rolling average function
why not
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
It is documented in Python doc .
You can easily extend it to wider window.
How about using the following:
mylist = [1, 2, 3, 4, 5, 6, 7]
def sliding_window(l, window_size=2):
if window_size > len(l):
raise ValueError("Window size must be smaller or equal to the number of elements in the list.")
t = []
for i in xrange(0, window_size):
t.append(l[i:])
return zip(*t)
print sliding_window(mylist, 3)
Output:
[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]
def GetShiftingWindows(thelist, size):
return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]
>> a = [1, 2, 3, 4, 5]
>> GetShiftingWindows(a, 3)
[ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]
Just to show how you can combine itertools
recipes, I’m extending the pairwise
recipe as directly as possible back into the window
recipe using the consume
recipe:
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
def window(iterable, n=2):
"s -> (s0, ...,s(n-1)), (s1, ...,sn), (s2, ..., s(n+1)), ..."
iters = tee(iterable, n)
# Could use enumerate(islice(iters, 1, None), 1) to avoid consume(it, 0), but that's
# slower for larger window sizes, while saving only small fixed "noop" cost
for i, it in enumerate(iters):
consume(it, i)
return zip(*iters)
The window
recipe is the same as for pairwise
, it just replaces the single element “consume” on the second tee
-ed iterator with progressively increasing consumes on n - 1
iterators. Using consume
instead of wrapping each iterator in islice
is marginally faster (for sufficiently large iterables) since you only pay the islice
wrapping overhead during the consume
phase, not during the process of extracting each window-ed value (so it’s bounded by n
, not the number of items in iterable
).
Performance-wise, compared to some other solutions, this is pretty good (and better than any of the other solutions I tested as it scales). Tested on Python 3.5.0, Linux x86-64, using ipython
%timeit
magic.
kindall’s the deque
solution, tweaked for performance/correctness by using islice
instead of a home-rolled generator expression and testing the resulting length so it doesn’t yield results when the iterable is shorter than the window, as well as passing the maxlen
of the deque
positionally instead of by keyword (makes a surprising difference for smaller inputs):
>>> %timeit -r5 deque(windowkindall(range(10), 3), 0)
100000 loops, best of 5: 1.87 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 3), 0)
10000 loops, best of 5: 72.6 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 30), 0)
1000 loops, best of 5: 71.6 μs per loop
Same as previous adapted kindall solution, but with each yield win
changed to yield tuple(win)
so storing results from the generator works without all stored results really being a view of the most recent result (all other reasonable solutions are safe in this scenario), and adding tuple=tuple
to the function definition to move use of tuple
from the B
in LEGB
to the L
:
>>> %timeit -r5 deque(windowkindalltupled(range(10), 3), 0)
100000 loops, best of 5: 3.05 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 3), 0)
10000 loops, best of 5: 207 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 30), 0)
1000 loops, best of 5: 348 μs per loop
consume
-based solution shown above:
>>> %timeit -r5 deque(windowconsume(range(10), 3), 0)
100000 loops, best of 5: 3.92 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 3), 0)
10000 loops, best of 5: 42.8 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 30), 0)
1000 loops, best of 5: 232 μs per loop
Same as consume
, but inlining else
case of consume
to avoid function call and n is None
test to reduce runtime, particularly for small inputs where the setup overhead is a meaningful part of the work:
>>> %timeit -r5 deque(windowinlineconsume(range(10), 3), 0)
100000 loops, best of 5: 3.57 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 3), 0)
10000 loops, best of 5: 40.9 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 30), 0)
1000 loops, best of 5: 211 μs per loop
(Side-note: A variant on pairwise
that uses tee
with the default argument of 2 repeatedly to make nested tee
objects, so any given iterator is only advanced once, not independently consumed an increasing number of times, similar to MrDrFenner’s answer is similar to non-inlined consume
and slower than the inlined consume
on all tests, so I’ve omitted it those results for brevity).
As you can see, if you don’t care about the possibility of the caller needing to store results, my optimized version of kindall’s solution wins most of the time, except in the “large iterable, small window size case” (where inlined consume
wins); it degrades quickly as the iterable size increases, while not degrading at all as the window size increases (every other solution degrades more slowly for iterable size increases, but also degrades for window size increases). It can even be adapted for the “need tuples” case by wrapping in map(tuple, ...)
, which runs ever so slightly slower than putting the tupling in the function, but it’s trivial (takes 1-5% longer) and lets you keep the flexibility of running faster when you can tolerate repeatedly returning the same value.
If you need safety against returns being stored, inlined consume
wins on all but the smallest input sizes (with non-inlined consume
being slightly slower but scaling similarly). The deque
& tupling based solution wins only for the smallest inputs, due to smaller setup costs, and the gain is small; it degrades badly as the iterable gets longer.
For the record, the adapted version of kindall’s solution that yield
s tuple
s I used was:
def windowkindalltupled(iterable, n=2, tuple=tuple):
it = iter(iterable)
win = deque(islice(it, n), n)
if len(win) < n:
return
append = win.append
yield tuple(win)
for e in it:
append(e)
yield tuple(win)
Drop the caching of tuple
in the function definition line and the use of tuple
in each yield
to get the faster but less safe version.
This is an old question but for those still interested there is a great implementation of a window slider using generators in this page (by Adrian Rosebrock).
It is an implementation for OpenCV however you can easily use it for any other purpose. For the eager ones i’ll paste the code here but to understand it better I recommend visiting the original page.
def sliding_window(image, stepSize, windowSize):
# slide a window across the image
for y in xrange(0, image.shape[0], stepSize):
for x in xrange(0, image.shape[1], stepSize):
# yield the current window
yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])
Tip: You can check the .shape
of the window when iterating the generator to discard those that do not meet your requirements
Cheers
There is a library which does exactly what you need:
import more_itertools
list(more_itertools.windowed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],n=3, step=3))
Out: [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]
Modified DiPaolo’s answer to allow arbitrary fill and variable step size
import itertools
def window(seq, n=2,step=1,fill=None,keep=0):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(itertools.islice(it, n))
if len(result) == n:
yield result
while True:
# for elem in it:
elem = tuple( next(it, fill) for _ in range(step))
result = result[step:] + elem
if elem[-1] is fill:
if keep:
yield result
break
yield result
#Importing the numpy library
import numpy as np
arr = np.arange(6) #Sequence
window_size = 3
np.lib.stride_tricks.as_strided(arr, shape= (len(arr) - window_size +1, window_size),
strides = arr.strides*2)
"""Example output:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
“””
Let’s make it lazy!
from itertools import islice, tee
def window(iterable, size):
iterators = tee(iterable, size)
iterators = [islice(iterator, i, None) for i, iterator in enumerate(iterators)]
yield from zip(*iterators)
list(window(range(5), 3))
# [(0, 1, 2), (1, 2, 3), (2, 3, 4)]
here is a one liner. I timed it and it’s comprable to the performance of the top answer and gets progressively better with larger seq from 20% slower with len(seq) = 20 and 7% slower with len(seq) = 10000
zip(*[seq[i:(len(seq) - n - 1 + i)] for i in range(n)])
Trying my part, simple, one liner, pythonic way using islice. But, may not be optimally efficient.
from itertools import islice
array = range(0, 10)
window_size = 4
map(lambda i: list(islice(array, i, i + window_size)), range(0, len(array) - window_size + 1))
# output = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8], [6, 7, 8, 9]]
Explanation:
Create window by using islice of window_size and iterate this operation using map over all array.
I tested a few solutions along with the one I came up with. I found the one I came up with to be the fastest so I thought I would share my python3 implementation.
import itertools
import sys
def windowed(l, stride):
return zip(*[itertools.islice(l, i, sys.maxsize) for i in range(stride)])
Optimized Function for sliding window data in Deep learning
def SlidingWindow(X, window_length, stride):
indexer = np.arange(window_length)[None, :] + stride*np.arange(int(len(X)/stride)-window_length+4)[:, None]
return X.take(indexer)
to apply on multidimensional array
import numpy as np
def SlidingWindow(X, window_length, stride1):
stride= X.shape[1]*stride1
window_length = window_length*X.shape[1]
indexer = np.arange(window_length)[None, :] + stride1*np.arange(int(len(X)/stride1)-window_length-1)[:, None]
return X.take(indexer)
my two versions of window
implementation
from typing import Sized, Iterable
def window(seq: Sized, n: int, strid: int = 1, drop_last: bool = False):
for i in range(0, len(seq), strid):
res = seq[i:i + n]
if drop_last and len(res) < n:
break
yield res
def window2(seq: Iterable, n: int, strid: int = 1, drop_last: bool = False):
it = iter(seq)
result = []
step = 0
for i, ele in enumerate(it):
result.append(ele)
result = result[-n:]
if len(result) == n:
if step % strid == 0:
yield result
step += 1
if not drop_last:
yield result
Another simple way to generate window of fixed length from a list
from collections import deque
def window(ls,window_size=3):
window = deque(maxlen=window_size)
for element in ls:
if len(window)==window_size:
yield list(window)
window.append(element)
ls = [0,1,2,3,4,5]
for w in window(ls):
print(w)
My (keep it simple) solution that I ended up using:
def sliding_window(items, size):
return [items[start:end] for start, end
in zip(range(0, len(items) - size + 1), range(size, len(items) + 1))]
Needless to say, the items
sequence needs to be sliceable. Working with indices is not ideal, but it seems to be the least bad option given the alternatives… This can also easily be changed to a generator: just replace [...]
with (...)
.
In Python 3.10, we have the itertools.pairwise(iterable)
function to slide a window with two elements:
Here’s the doc :
Return successive overlapping pairs taken from the input iterable.
The number of 2-tuples in the output iterator will be one fewer than the number of inputs. It will be empty if the input iterable has fewer than two values.
Roughly equivalent to:
def pairwise(iterable): # pairwise('ABCDEFG') --> AB BC CD DE EF FG a, b = tee(iterable) next(b, None) return zip(a, b)
Update
This is a duplicated answer here, as Kelly find out. But I am leaving this here as a counter-example since I include a pointless min
.
So if you feel tempted to use min to avoid IndexError
don’t, there is no need, range
will handle that case for you.
Old answer
Curiously the following handle automatically when n > len(l)
returning []
which is semantically correct.
>>> l = [0, 1, 2, 3, 4]
>>> n = 2
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1], [1, 2], [2, 3], [3, 4]]
>>>
>>> n = 3
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2], [1, 2, 3], [2, 3, 4]]
>>>
>>> n = 4
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2, 3], [1, 2, 3, 4]]
>>>
>>> n = 5
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2, 3, 4]]
>>>
>>> n = 10 # n > len(l)
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> []
I find this solution more elegant than using built-in functions.
words = ["this", "is", "an", "example"]
def get_sliding_windows(doc, sliding_window, padded=False):
all_windows = []
for i in range(sliding_window):
front = sliding_window-i
all_windows.append(front*['']+doc+i*[''])
if padded:
return np.array(all_windows).transpose()[1:]
else:
return np.array(all_windows).transpose()[sliding_window:-1]
>>> get_sliding_windows(words,3)
>>> array([['this', 'is', 'an'],
['is', 'an', 'example'],
['an', 'example', '']], dtype='<U7')
>>> get_padded_sliding_windows(words,3, True)
>>> array([['', '', 'this'],
['', 'this', 'is'],
['this', 'is', 'an'],
['is', 'an', 'example'],
['an', 'example', ''],
['example', '', '']], dtype='<U7')