Iterate over list selecting multiple elements at a time in Python
Question:
I have a list, from which I would like to iterate over slices of a certain length, overlapping each other by the largest amount possible, for example:
>>> seq = 'ABCDEF'
>>> [''.join(x) for x in zip(seq, seq[1:], seq[2:])]
['ABC', 'BCD', 'CDE', 'DEF']
In other words, is there a shorthand for zip(seq, seq[1:], seq[2:])
where you can specify the length of each sub-sequence?
Answers:
Not an elegant solution, but this works:
seq = 'ABCDEF'
n=3
[seq[i:i+n] for i in range(0, len(seq)+1-n)]
[seq[i:i+3] for i in range(len(seq)-2)]
is the Python code for something similar.
The far more elegant and recommended version of this is to use the itertools library from Python (seriously, why do they not just include this function in the library?).
In this case, you would instead use something similar to the pairwise
function provided in the documentation.
from itertools import tee
def tripletWise(iterable):
"s -> (s0,s1,s2), (s1,s2,s3), (s2,s3,s4), ..."
a, b, c = tee(iterable, 3)
next(b, None)
next(c, None)
next(c, None)
return zip(a, b)
[''.join(i) for i in tripletWise('ABCDEF')]
> ['ABC', 'BCD', 'CDE', 'DEF']
You can also create a more general function to chunk the list into however many elements you want to select at a time.
def nWise(iterable, n=2):
iterableList = tee(iterable, n)
for i in range(len(iterableList)):
for j in range(i):
next(iterableList[i], None)
return zip(*iterableList)
[''.join(i) for i in nWise('ABCDEF', 4)]
> ['ABCD', 'BCDE', 'CDEF']
Use grouper() in the itertools examples. Specifically grouper(<iter>,3)
.
https://docs.python.org/3/library/itertools.html#itertools-recipes
Or, from the same page, another recommendation is installing more-itertools. Then you can use ichunked()
or chunked()
.
I have a list, from which I would like to iterate over slices of a certain length, overlapping each other by the largest amount possible, for example:
>>> seq = 'ABCDEF'
>>> [''.join(x) for x in zip(seq, seq[1:], seq[2:])]
['ABC', 'BCD', 'CDE', 'DEF']
In other words, is there a shorthand for zip(seq, seq[1:], seq[2:])
where you can specify the length of each sub-sequence?
Not an elegant solution, but this works:
seq = 'ABCDEF'
n=3
[seq[i:i+n] for i in range(0, len(seq)+1-n)]
[seq[i:i+3] for i in range(len(seq)-2)]
is the Python code for something similar.
The far more elegant and recommended version of this is to use the itertools library from Python (seriously, why do they not just include this function in the library?).
In this case, you would instead use something similar to the pairwise
function provided in the documentation.
from itertools import tee
def tripletWise(iterable):
"s -> (s0,s1,s2), (s1,s2,s3), (s2,s3,s4), ..."
a, b, c = tee(iterable, 3)
next(b, None)
next(c, None)
next(c, None)
return zip(a, b)
[''.join(i) for i in tripletWise('ABCDEF')]
> ['ABC', 'BCD', 'CDE', 'DEF']
You can also create a more general function to chunk the list into however many elements you want to select at a time.
def nWise(iterable, n=2):
iterableList = tee(iterable, n)
for i in range(len(iterableList)):
for j in range(i):
next(iterableList[i], None)
return zip(*iterableList)
[''.join(i) for i in nWise('ABCDEF', 4)]
> ['ABCD', 'BCDE', 'CDEF']
Use grouper() in the itertools examples. Specifically grouper(<iter>,3)
.
https://docs.python.org/3/library/itertools.html#itertools-recipes
Or, from the same page, another recommendation is installing more-itertools. Then you can use ichunked()
or chunked()
.