Iterate an iterator by chunks (of n) in Python?

Question:

Can you think of a nice way (maybe with itertools) to split an iterator into chunks of given size?

Therefore l=[1,2,3,4,5,6,7] with chunks(l,3) becomes an iterator [1,2,3], [4,5,6], [7]

I can think of a small program to do that but not a nice way with maybe itertools.

Asked By: Gere

||

Answers:

The grouper() recipe from the itertools documentation’s recipes comes close to what you want:

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')

This won’t work well when the last chunk is incomplete though, as, depending on the incomplete mode, it will either fill up the last chunk with a fill value, raise an exception, or silently drop the incomplete chunk.

In more recent versions of the recipes they added the batched recipe that does exactly what you want:

def batched(iterable, n):
    "Batch data into tuples of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while (batch := tuple(islice(it, n))):
        yield batch

Finally, a less general solution that only works on sequences but does handle the last chunk as desired and preserves the type of the original sequence is:

(my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size))
Answered By: Sven Marnach

“Simpler is better than complex” –
a straightforward generator a few lines long can do the job. Just place it in some utilities module or so:

def grouper (iterable, n):
    iterable = iter(iterable)
    count = 0
    group = []
    while True:
        try:
            group.append(next(iterable))
            count += 1
            if count % n == 0:
                yield group
                group = []
        except StopIteration:
            yield group
            break
Answered By: jsbueno

Here’s one that returns lazy chunks; use map(list, chunks(...)) if you want lists.

from itertools import islice, chain
from collections import deque

def chunks(items, n):
    items = iter(items)
    for first in items:
        chunk = chain((first,), islice(items, n-1))
        yield chunk
        deque(chunk, 0)

if __name__ == "__main__":
    for chunk in map(list, chunks(range(10), 3)):
        print chunk

    for i, chunk in enumerate(chunks(range(10), 3)):
        if i % 2 == 1:
            print "chunk #%d: %s" % (i, list(chunk))
        else:
            print "skipping #%d" % i
Answered By: Peter Otten

I forget where I found the inspiration for this. I’ve modified it a little to work with MSI GUID’s in the Windows Registry:

def nslice(s, n, truncate=False, reverse=False):
    """Splits s into n-sized chunks, optionally reversing the chunks."""
    assert n > 0
    while len(s) >= n:
        if reverse: yield s[:n][::-1]
        else: yield s[:n]
        s = s[n:]
    if len(s) and not truncate:
        yield s

reverse doesn’t apply to your question, but it’s something I use extensively with this function.

>>> [i for i in nslice([1,2,3,4,5,6,7], 3)]
[[1, 2, 3], [4, 5, 6], [7]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True)]
[[1, 2, 3], [4, 5, 6]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True, reverse=True)]
[[3, 2, 1], [6, 5, 4]]
Answered By: Zach Young

Here you go.

def chunksiter(l, chunks):
    i,j,n = 0,0,0
    rl = []
    while n < len(l)/chunks:        
        rl.append(l[i:j+chunks])        
        i+=chunks
        j+=j+chunks        
        n+=1
    return iter(rl)


def chunksiter2(l, chunks):
    i,j,n = 0,0,0
    while n < len(l)/chunks:        
        yield l[i:j+chunks]
        i+=chunks
        j+=j+chunks        
        n+=1

Examples:

for l in chunksiter([1,2,3,4,5,6,7,8],3):
    print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]

for l in chunksiter2([1,2,3,4,5,6,7,8],3):
    print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]


for l in chunksiter2([1,2,3,4,5,6,7,8],5):
    print(l)

[1, 2, 3, 4, 5]
[6, 7, 8]
Answered By: Carlos Quintanilla

A succinct implementation is:

chunker = lambda iterable, n: (ifilterfalse(lambda x: x == (), chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=())))

This works because [iter(iterable)]*n is a list containing the same iterator n times; zipping over that takes one item from each iterator in the list, which is the same iterator, with the result that each zip-element contains a group of n items.

izip_longest is needed to fully consume the underlying iterable, rather than iteration stopping when the first exhausted iterator is reached, which chops off any remainder from iterable. This results in the need to filter out the fill-value. A slightly more robust implementation would therefore be:

def chunker(iterable, n):
    class Filler(object): pass
    return (ifilterfalse(lambda x: x is Filler, chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=Filler)))

This guarantees that the fill value is never an item in the underlying iterable. Using the definition above:

iterable = range(1,11)

map(tuple,chunker(iterable, 3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10,)]

map(tuple,chunker(iterable, 2))
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

map(tuple,chunker(iterable, 4))
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10)]

This implementation almost does what you want, but it has issues:

def chunks(it, step):
  start = 0
  while True:
    end = start+step
    yield islice(it, start, end)
    start = end

(The difference is that because islice does not raise StopIteration or anything else on calls that go beyond the end of it this will yield forever; there is also the slightly tricky issue that the islice results must be consumed before this generator is iterated).

To generate the moving window functionally:

izip(count(0, step), count(step, step))

So this becomes:

(it[start:end] for (start,end) in izip(count(0, step), count(step, step)))

But, that still creates an infinite iterator. So, you need takewhile (or perhaps something else might be better) to limit it:

chunk = lambda it, step: takewhile((lambda x: len(x) > 0), (it[start:end] for (start,end) in izip(count(0, step), count(step, step))))

g = chunk(range(1,11), 3)

tuple(g)
([1, 2, 3], [4, 5, 6], [7, 8, 9], [10])

Answered By: Marcin

Although OP asks function to return chunks as list or tuple, in case you need to return iterators, then Sven Marnach’s solution can be modified:

def batched_it(iterable, n):
    "Batch data into iterators of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while True:
        chunk_it = itertools.islice(it, n)
        try:
            first_el = next(chunk_it)
        except StopIteration:
            return
        yield itertools.chain((first_el,), chunk_it)

Some benchmarks: http://pastebin.com/YkKFvm8b

It will be slightly more efficient only if your function iterates through elements in every chunk.

Answered By: reclosedev

I was working on something today and came up with what I think is a simple solution. It is similar to jsbueno’s answer, but I believe his would yield empty groups when the length of iterable is divisible by n. My answer does a simple check when the iterable is exhausted.

def chunk(iterable, chunk_size):
    """Generates lists of `chunk_size` elements from `iterable`.
    
    
    >>> list(chunk((2, 3, 5, 7), 3))
    [[2, 3, 5], [7]]
    >>> list(chunk((2, 3, 5, 7), 2))
    [[2, 3], [5, 7]]
    """
    iterable = iter(iterable)
    while True:
        chunk = []
        try:
            for _ in range(chunk_size):
                chunk.append(next(iterable))
            yield chunk
        except StopIteration:
            if chunk:
                yield chunk
            break
Answered By: eidorb

This will work on any iterable. It returns generator of generators (for full flexibility). I now realize that it’s basically the same as @reclosedevs solution, but without the fluff. No need for try...except as the StopIteration propagates up, which is what we want.

The next(iterable) call is needed to raise the StopIteration when the iterable is empty, since islice will continue spawning empty generators forever if you let it.

It’s better because it’s only two lines long, yet easy to comprehend.

def grouper(iterable, n):
    while True:
        yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))

Note that next(iterable) is put into a tuple. Otherwise, if next(iterable) itself were iterable, then itertools.chain would flatten it out. Thanks to Jeremy Brown for pointing out this issue.

Answered By: Svein Lindal

Since python 3.8, there is a simpler solution using the := operator:

def grouper(iterator: Iterator, n: int) -> Iterator[list]:
    while chunk := list(itertools.islice(iterator, n)):
        yield chunk

and then call it that way:

>>> list(grouper(iter('ABCDEFG'), 3))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]

Note: you can put iter in the grouper function to take an Iterable instead of an Iterator.

Answered By: Cédric ROYER

Code golf edition:

def grouper(iterable, n):
    for i in range(0, len(iterable), n):
        yield iterable[i:i+n]

Usage:

>>> list(grouper('ABCDEFG', 3))
['ABC', 'DEF', 'G']
Answered By: johnson

This function takes iterables which do not need to be Sized, so it will accept iterators too. It supports infinite iterables and will error-out if chunks with a smaller size than 1 are selected (even though giving size == 1 is effectively useless).

The type annotations are of course optional and the / in the parameters (which makes iterable positional-only) can be removed if you wish.

T = TypeVar("T")


def chunk(iterable: Iterable[T], /, size: int) -> Generator[list[T], None, None]:
    """Yield chunks of a given size from an iterable."""
    if size < 1:
        raise ValueError("Cannot make chunks smaller than 1 item.")

    def chunker():
        current_chunk = []
        for item in iterable:
            current_chunk.append(item)

            if len(current_chunk) == size:
                yield current_chunk

                current_chunk = []

        if current_chunk:
            yield current_chunk

    # Chunker generator is returned instead of yielding directly so that the size check
    #  can raise immediately instead of waiting for the first next() call.
    return chunker()
Answered By: theberzi

A couple improvements on reclosedev’s answer that make it:

  1. Operate more efficiently and with less boilerplate code in the loop by delegating the pulling of the first element to Python itself, rather than manually doing so with a next call in a try/except StopIteration: block

  2. Handle the case where the user discards the rest of the elements in any given chunk (e.g. an inner loop over the chunk breaks under certain conditions); in reclosedev’s solution, aside from the very first element (which is definitely consumed), any other "skipped" elements aren’t actually skipped (they just become the initial elements of the next chunk, which means you’re no longer pulling data from n-aligned offsets, and if the caller breaks a loop over a chunk, they must manually consume the remaining elements even if they don’t need them)

Combining those two fixes gets:

import collections  # At top of file
from itertools import chain, islice  # At top of file, denamespaced for slight speed boost

# Pre-create a utility "function" that silently consumes and discards all remaining elements in
# an iterator. This is the fastest way to do so on CPython (deque has a specialized mode
# for maxlen=0 that pulls and discards faster than Python level code can, and by precreating
# the deque and prebinding the extend method, you don't even need to create new deques each time)
_consume = collections.deque(maxlen=0).extend

def batched_it(iterable, n):
    "Batch data into sub-iterators of length n. The last batch may be shorter."
    # batched_it('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    n -= 1  # First element pulled for us, pre-decrement n so we don't redo it every loop
    it = iter(iterable)
    for first_el in it:
        chunk_it = islice(it, n)
        try:
            yield chain((first_el,), chunk_it)
        finally:
            _consume(chunk_it)  # Efficiently consume any elements caller didn't consume

Try it online!

Answered By: ShadowRanger

Python 3.12 adds itertools.batched, which works on all iterables (including lists):

>>> from itertools import batched
>>> for batch in batched('ABCDEFG', 3):
...     print(batch)
('A', 'B', 'C')
('D', 'E', 'F')
('G',)
Answered By: mike

Recursive solution:

def batched(i: Iterable, split: int) -> Tuple[Iterable, ...]:
    if chunk := i[:split]:
        yield chunk
        yield from batched(i[split:], split)
Answered By: zhukovgreen

Here is a simple one:

n=2
l = list(range(15))
[l[i:i+n] for i in range(len(l)) if i%n==0]
Out[10]: [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14]]

for i in range(len(l)): This part specifies the iteration over the indices of l using the range() function and len(l) as the upper limit.

if i % n == 0: This condition filters the elements for the new list. i % n checks if the current index i is divisible by n without a remainder. If it is, the element at that index will be included in the new list; otherwise, it will be skipped.

l[i:i+n]: This part extracts a sublist from l. It uses slicing notation to specify a range of indices from i to i+n-1. So, for each index i that meets the condition i % n == 0, a sublist of length n is created, starting from that index.

Alternative (faster for bigger stuff):

[l[i:i+n] for i in range(0,len(l),n)]
Answered By: Hans
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.