Iterate over a python sequence in multiples of n?

Question:

How do I process the elements of a sequence in batches, idiomatically?

For example, with the sequence “abcdef” and a batch size of 2, I would like to do something like the following:

for x, y in "abcdef":
    print "%s%sn" % (x, y)
ab
cd
ef

Of course, this doesn’t work because it is expecting a single element from the list which itself contains 2 elements.

What is a nice, short, clean, pythonic way to process the next n elements of a list in a batch, or sub-strings of length n from a larger string (two similar problems)?

Asked By: newbpy

||

Answers:

One solution, although I challenge someone to do better 😉

a = 'abcdef'
b = [[a[i-1], a[i]] for i in range(1, len(a), 2)]

for x, y in b:
  print "%s%sn" % (x, y)
Answered By: Jason Coon

I am sure someone is going to come up with some more “Pythonic” but how about:

for y in range(0, len(x), 2):
    print "%s%s" % (x[y], x[y+1])

Note that this would only work if you know that len(x) % 2 == 0;

Answered By: Paolo Bergantino

you can create the following generator

def chunks(seq, size):
    a = range(0, len(seq), size)
    b = range(size, len(seq) + 1, size)
    for i, j in zip(a, b):
        yield seq[i:j]

and use it like this:

for i in chunks('abcdef', 2):
    print(i)
Answered By: SilentGhost

A generator function would be neat:

def batch_gen(data, batch_size):
    for i in range(0, len(data), batch_size):
            yield data[i:i+batch_size]

Example use:

a = "abcdef"
for i in batch_gen(a, 2): print i

prints:

ab
cd
ef
Answered By: rpr

Don’t forget about the zip() function:

a = 'abcdef'
for x,y in zip(a[::2], a[1::2]):
  print '%s%s' % (x,y)
Answered By: Jason Coon

but the more general way would be (inspired by this answer):

for i in zip(*(seq[i::size] for i in range(size))):
    print(i)                            # tuple of individual values
Answered By: SilentGhost

I’ve got an alternative approach, that works for iterables that don’t have a known length.

   
def groupsgen(seq, size):
    it = iter(seq)
    while True:
        values = ()        
        for n in xrange(size):
            values += (it.next(),)        
        yield values    

It works by iterating over the sequence (or other iterator) in groups of size, collecting the values in a tuple. At the end of each group, it yield the tuple.

When the iterator runs out of values, it produces a StopIteration exception which is then propagated up, indicating that groupsgen is out of values.

It assumes that the values come in sets of size (sets of 2, 3, etc). If not, any values left over are just discarded.

Answered By: Silverfish

How about itertools?

from itertools import islice, groupby

def chunks_islice(seq, size):
    while True:
        aux = list(islice(seq, 0, size))
        if not aux: break
        yield "".join(aux)

def chunks_groupby(seq, size):
    for k, chunk in groupby(enumerate(seq), lambda x: x[0] / size):
        yield "".join([i[1] for i in chunk])
Answered By: Jochen Wersdörfer
>>> a = "abcdef"
>>> size = 2
>>> [a[x:x+size] for x in range(0, len(a), size)]
['ab', 'cd', 'ef']

..or, not as a list comprehension:

a = "abcdef"
size = 2
output = []
for x in range(0, len(a), size):
    output.append(a[x:x+size])

Or, as a generator, which would be best if used multiple times (for a one-use thing, the list comprehension is probably “best”):

def chunker(thelist, segsize):
    for x in range(0, len(thelist), segsize):
            yield thelist[x:x+segsize]

..and it’s usage:

>>> for seg in chunker(a, 2):
...     print seg
... 
ab
cd
ef
Answered By: dbr

And then there’s always the documentation.

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    try:
        b.next()
    except StopIteration:
        pass
    return izip(a, b)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

Note: these produce tuples instead of substrings, when given a string sequence as input.

Answered By: tzot

s = 'abcdefgh'
for e in (s[i:i+2] for i in range(0,len(s),2)):
  print(e)
Answered By: Jacob Engelbrecht

The itertools doc has a recipe for this:

from itertools import izip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Usage:

>>> l = [1,2,3,4,5,6,7,8,9]
>>> [z for z in grouper(l, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
Answered By: dano

Except for two answers I saw a lot of premature materialization on the batches, and subscripting (which does not work for all iterators). Hence I came up with this alternative:

def iter_x_and_n(iterable, x, n):
    yield x
    try:
        for _ in range(n):
            yield next(iterable)
    except StopIteration:
        pass

def batched(iterable, n):
    if n<1: raise ValueError("Can not create batches of size %d, number must be strictly positive" % n)
    iterable = iter(iterable)
    try:
        for x in iterable:
            yield iter_x_and_n(iterable, x, n-1)
    except StopIteration:
        pass

It beats me that there is no one-liner or few-liner solution for this (to the best of my knowleged). The key issue is that both the outer generator and the inner generator need to handle the StopIteration correctly. The outer generator should only yield something if there is at least one element left. The intuitive way to check this, is to execute a next(…) and catch a StopIteration.

Answered By: Herbert

From the docs of more_itertools: more_itertools.chunked()

more_itertools.chunked(iterable, n)

Break an iterable into lists of a given length:

>>> list(chunked([1, 2, 3, 4, 5, 6, 7], 3))
[[1, 2, 3], [4, 5, 6], [7]]

If the length of iterable is not evenly divisible by n, the last returned list will be shorter.

Answered By: Gregor Melhorn

Given

from __future__ import print_function                      # python 2.x

seq = "abcdef"
n = 2

Code

while seq:
    print("{}".format(seq[:n]), end="n")
    seq = seq[n:]

Output

ab
cd
ef
Answered By: pylang

Here is a solution, which yields a series of iterators, each of which iterates over n items.

def groupiter(thing, n):
    def countiter(nextthing, thingiter, n):
        yield nextthing
        for _ in range(n - 1):
            try:
                nextitem = next(thingiter)
            except StopIteration:
                return
            yield nextitem
    thingiter = iter(thing)
    while True:
        try:
            nextthing = next(thingiter)
        except StopIteration:
            return
        yield countiter(nextthing, thingiter, n)

I use it as follows:

table = list(range(250))
for group in groupiter(table, 16):
    print(' '.join('0x{:02X},'.format(x) for x in group))

Note that it can handle the length of the object not being a multiple of n.

Answered By: Craig McQueen

Adapted from this answer for Python 3:

def groupsgen(seq, size):
    it = iter(seq)
    iterating = True
    while iterating:
        values = ()
        try:
            for n in range(size):
                values += (next(it),)
        except StopIteration:
            iterating = False
            if not len(values):
                return None
        yield values

It will safely terminate and won’t discard values if their number is not divisible by size.

Answered By: Karol Trojanowski
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.