Improve itertools.pairwise() function

Question:

I am trying to create a custom function that improves the itertools pairwise function.

Unlike the pairwise function, which returns pairs of items (n=2), I would like to be able to specify the grouping length (n=x). So that I can return groups of items of a specified length. Is this possible?

So when n=2, the groupwise function should be equivalent to the itertools pairwise function. But when n=3, I would expect it to return groups of 3.

This is what I have so far – which is essentially the same as the pairwise function. ie. works for n=2.

Code:

from itertools import tee

my_string = 'abcdef'

def groupwise(iterable, n):
    a = tee(iterable, n)
    next(a[n-1], None)
    return zip(a[0], a[1])

for item in groupwise(my_string, 2):
    print("".join(item))

Output:

ab
bc
cd
de
ef

I am trying to modify the code above to accept n=3, n=4, n=5, ... n=x, but I cannot think of a way to provide the undetermined number of components to the zip function considering that I would like to specify any value of n <= len(my_string)

When n=3 the output should be:

abc
bcd
cde
def
Asked By: ScottC

||

Answers:

tee isn’t going to scale in this way, unfortunately. Writing pairwise takes one tee call, and if you want to do it for N elements in each group, you’ll need N-1 tee calls.

Fortunately, we can do better by just rolling the loop ourselves. What we’re going to do is keep track of the last N elements that we’ve seen ourselves and yield them whenever we have a group that’s big enough. To do that, we need a data structure that can efficiently add things to the right and subtract them from the left. collections.deque will do nicely. In fact, it even has a maxlen parameter that will automatically subtract from the left exactly when we need it to.


from collections import deque

def groupwise(iterable, n):
    accum = deque((), n)
    for element in iterable:
        accum.append(element)
        if len(accum) == n:
            yield tuple(accum)

Construct an empty deque (the first constructor argument is the initial elements: an empty tuple) whose capacity is n. When we have more than n elements added to the right side, it will automatically drop the leftmost one. We could do this by hand, but deque will do it for us so we might as well utilize the functionality they gave us.

Then iterate over the iterable. Append each element to the end of the deque (dropping from the left as needed to satisfy the maximum length requirement). Then, if the deque has n elements, we yield it. That makes our function a generator function, so it will produce a new iterator consisting of all of the yielded values.

We make a shallow copy of our deque so we’re not just yielding the same (mutable) deque object each time. We also make that shallow copy a tuple, rather than a deque, to be more in line with other itertools functionality.

To call, use

print(list(groupwise(range(10), 3)))
# Output: [(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9)]
Answered By: Silvio Mayolo

As per Stef‘s comments, there are some other alternative answers. Neither are significantly different from Silvio‘s answer, but I thought to include them here for completeness:

Stef’s code:

from itertools import islice
from collections import deque

def groupwise(iterable, n):
    iterator = iter(iterable); 
    accum = deque(islice(iterator, n-1), maxlen=n); 
    
    for x in iterator: 
        accum.append(x); 
        yield tuple(accum)

Kelly Bundy’s Code:

from itertools import islice

def groupwise(iterable, n):
  it = iter(iterable)
  accum = None, *islice(it, n-1)
  for x in zip(it):
    accum = accum[1:] + x
    yield accum

rici’s Code:

from itertools import islice
from collections import deque

def groupwise(iterable, n):     
    it = iter(iterable)     
    if n > 0 and len(accum := deque(islice(it, n-1), maxlen=n)) == n - 1:         
        return (accum.append(v) or tuple(accum) for v in it)      
    else:         
        return ()

rici v2 Code:

from itertools import islice
from collections import deque

def groupwise(iterable, n): 
    it = iter(iterable) 
    if n < 1: 
        return () 
    accum = deque(islice(it, n-1), maxlen = n) 
    return (accum.append(v) or tuple(accum) for v in it)

Using Windowed:

As per Stef’s comment, there is a function within more-itertools called windowed that provides the required functionality:

from more_itertools import windowed

my_string = 'abcdef'

for item in windowed(my_string, 3):
    print("".join(item))

ScottC’s version adapted from Windowed:

Borrowing a portion of the windowed code, and taking advice from Kelly Bundy, Stef and rici

from itertools import islice
from collections import deque

def groupwise(iterable, n): 
    if n < 1:
        return None

    it = iter(iterable) 
    accum = deque(islice(it, n-1), maxlen=n)
        
    for _ in map(accum.append, it):
        yield tuple(accum)

print(list(groupwise('abcde',3)))

Output:

[('a', 'b', 'c'), ('b', 'c', 'd'), ('c', 'd', 'e')]
Answered By: ScottC