I can’t figure out how to look ahead one element in a Python generator. As soon as I look it’s gone.
Here is what I mean:
gen = iter([1,2,3]) next_value = gen.next() # okay, I looked forward and see that next_value = 1 # but now: list(gen) # is [2, 3] -- the first value is gone!
Here is a more real example:
gen = element_generator() if gen.next_value() == 'STOP': quit_application() else: process(gen.next())
Can anyone help me write a generator that you can look one element forward?
See also: Resetting generator object in Python
The Python generator API is one way: You can’t push back elements you’ve read. But you can create a new iterator using the itertools module and prepend the element:
import itertools gen = iter([1,2,3]) peek = gen.next() print list(itertools.chain([peek], gen))
This will work — it buffers an item and calls a function with each item and the next item in the sequence.
Your requirements are murky on what happens at the end of the sequence. What does “look ahead” mean when you’re at the last one?
def process_with_lookahead( iterable, aFunction ): prev= iterable.next() for item in iterable: aFunction( prev, item ) prev= item aFunction( item, None ) def someLookaheadFunction( item, next_item ): print item, next_item
Ok – two years too late – but I came across this question, and did not find any of the answers to my satisfaction. Came up with this meta generator:
class Peekorator(object): def __init__(self, generator): self.empty = False self.peek = None self.generator = generator try: self.peek = self.generator.next() except StopIteration: self.empty = True def __iter__(self): return self def next(self): """ Return the self.peek element, or raise StopIteration if empty """ if self.empty: raise StopIteration() to_return = self.peek try: self.peek = self.generator.next() except StopIteration: self.peek = None self.empty = True return to_return def simple_iterator(): for x in range(10): yield x*3 pkr = Peekorator(simple_iterator()) for i in pkr: print i, pkr.peek, pkr.empty
0 3 False 3 6 False 6 9 False 9 12 False ... 24 27 False 27 None False
i.e. you have at any moment during iteration access to the next item in the list.
Instead of using items (i, i+1), where ‘i’ is the current item and i+1 is the ‘peek ahead’ version, you should be using (i-1, i), where ‘i-1’ is the previous version from the generator.
Tweaking your algorithm this way will produce something that is identical to what you currently have, apart from the extra needless complexity of trying to ‘peek ahead’.
Peeking ahead is a mistake, and you should not be doing it.
itertools.tee will produce a lightweight copy of the generator; then peeking ahead at one copy will not affect the second copy. Thus:
import itertools def process(seq): peeker, items = itertools.tee(seq) # initial peek ahead # so that peeker is one ahead of items if next(peeker) == 'STOP': return for item in items: # peek ahead if next(peeker) == "STOP": return # process items print(item)
items generator is unaffected by modifications to
peeker. However, modifying
seq after the call to
tee may cause problems.
That said: any algorithm that requires looking an item ahead in a generator could instead be written to use the current generator item and the previous item. This will result in simpler code – see my other answer to this question.
>>> gen = iter(range(10)) >>> peek = next(gen) >>> peek 0 >>> gen = (value for g in ([peek], gen) for value in g) >>> list(gen) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Just for fun, I created an implementation of a lookahead class based on the suggestion by
import itertools class lookahead_chain(object): def __init__(self, it): self._it = iter(it) def __iter__(self): return self def next(self): return next(self._it) def peek(self, default=None, _chain=itertools.chain): it = self._it try: v = self._it.next() self._it = _chain((v,), it) return v except StopIteration: return default lookahead = lookahead_chain
With this, the following will work:
>>> t = lookahead(xrange(8)) >>> list(itertools.islice(t, 3)) [0, 1, 2] >>> t.peek() 3 >>> list(itertools.islice(t, 3)) [3, 4, 5]
With this implementation it is a bad idea to call peek many times in a row…
While looking at the CPython source code I just found a better way which is both shorter and more efficient:
class lookahead_tee(object): def __init__(self, it): self._it, = itertools.tee(it, 1) def __iter__(self): return self._it def peek(self, default=None): try: return self._it.__copy__().next() except StopIteration: return default lookahead = lookahead_tee
Usage is the same as above but you won’t pay a price here to use peek many times in a row. With a few more lines you can also look ahead more than one item in the iterator (up to available RAM).
For sake of completeness, the
more-itertools package (which should probably be part of any Python programmer’s toolbox) includes a
peekable wrapper that implements this behavior. As the code example in the documentation shows:
>>> p = peekable(['a', 'b']) >>> p.peek() 'a' >>> next(p) 'a'
However, it’s often possible to rewrite code that would use this functionality so that it doesn’t actually need it. For example, your realistic code sample from the question could be written like this:
gen = element_generator() command = gen.next_value() if command == 'STOP': quit_application() else: process(command)
(reader’s note: I’ve preserved the syntax in the example from the question as of when I’m writing this, even though it refers to an outdated version of Python)
itertools.chain() is the natural tool for the job here, beware of loops like this:
for elem in gen: ... peek = next(gen) gen = itertools.chain([peek], gen)
…Because this will consume a linearly growing amount of memory, and eventually grind to a halt. (This code essentially seems to create a linked list, one node per chain() call.) I know this not because I inspected the libs but because this just resulted in a major slowdown of my program – getting rid of the
gen = itertools.chain([peek], gen) line sped it up again. (Python 3.3)
A simple solution is to use a function like this:
def peek(it): first = next(it) return first, itertools.chain([first], it)
Then you can do:
>>> it = iter(range(10)) >>> x, it = peek(it) >>> x 0 >>> next(it) 0 >>> next(it) 1
Python3 snippet for @jonathan-hartley answer:
def peek(iterator, eoi=None): iterator = iter(iterator) try: prev = next(iterator) except StopIteration: return iterator for elm in iterator: yield prev, elm prev = elm yield prev, eoi for curr, nxt in peek(range(10)): print((curr, nxt)) # (0, 1) # (1, 2) # (2, 3) # (3, 4) # (4, 5) # (5, 6) # (6, 7) # (7, 8) # (8, 9) # (9, None)
It’d be straightforward to create a class that does this on
__iter__ and yields just the
prev item and put the
elm in some attribute.
If anybody is interested, and please correct me if I am wrong, but I believe it is pretty easy to add some push back functionality to any iterator.
class Back_pushable_iterator: """Class whose constructor takes an iterator as its only parameter, and returns an iterator that behaves in the same way, with added push back functionality. The idea is to be able to push back elements that need to be retrieved once more with the iterator semantics. This is particularly useful to implement LL(k) parsers that need k tokens of lookahead. Lookahead or push back is really a matter of perspective. The pushing back strategy allows a clean parser implementation based on recursive parser functions. The invoker of this class takes care of storing the elements that should be pushed back. A consequence of this is that any elements can be "pushed back", even elements that have never been retrieved from the iterator. The elements that are pushed back are then retrieved through the iterator interface in a LIFO-manner (as should logically be expected). This class works for any iterator but is especially meaningful for a generator iterator, which offers no obvious push back ability. In the LL(k) case mentioned above, the tokenizer can be implemented by a standard generator function (clean and simple), that is completed by this class for the needs of the actual parser. """ def __init__(self, iterator): self.iterator = iterator self.pushed_back =  def __iter__(self): return self def __next__(self): if self.pushed_back: return self.pushed_back.pop() else: return next(self.iterator) def push_back(self, element): self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10)) x = next(it) # 0 print(x) it.push_back(x) x = next(it) # 0 print(x) x = next(it) # 1 print(x) x = next(it) # 2 y = next(it) # 3 print(x) print(y) it.push_back(y) it.push_back(x) x = next(it) # 2 y = next(it) # 3 print(x) print(y) for x in it: print(x) # 4-9
w.r.t @David Z’s post, the newer
seekable tool can reset a wrapped iterator to a prior position.
>>> s = mit.seekable(range(3)) >>> s.next() # 0 >>> s.seek(0) # reset iterator >>> s.next() # 0 >>> s.next() # 1 >>> s.seek(1) >>> s.next() # 1 >>> next(s) # 2
An iterator that allows peeking at the next element and also further ahead. It reads ahead as needed and remembers the values in a
from collections import deque class PeekIterator: def __init__(self, iterable): self.iterator = iter(iterable) self.peeked = deque() def __iter__(self): return self def __next__(self): if self.peeked: return self.peeked.popleft() return next(self.iterator) def peek(self, ahead=0): while len(self.peeked) <= ahead: self.peeked.append(next(self.iterator)) return self.peeked[ahead]
>>> it = PeekIterator(range(10)) >>> it.peek() 0 >>> it.peek(5) 5 >>> it.peek(13) Traceback (most recent call last): File "<pyshell#68>", line 1, in <module> it.peek(13) File "[...]", line 15, in peek self.peeked.append(next(self.iterator)) StopIteration >>> it.peek(2) 2 >>> next(it) 0 >>> it.peek(2) 3 >>> list(it) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>>
For those of you who embrace frugality and one-liners, I present to you a one-liner that allows one to look ahead in an iterable (this only works in Python 3.8 and above):
>>> import itertools as it >>> peek = lambda iterable, n=1: it.islice(zip(it.chain((t := it.tee(iterable)), [None] * n), it.chain([None] * n, t)), n, None) >>> for lookahead, element in peek(range(10)): ... print(lookahead, element) 1 0 2 1 3 2 4 3 5 4 6 5 7 6 8 7 9 8 None 9 >>> for lookahead, element in peek(range(10), 2): ... print(lookahead, element) 2 0 3 1 4 2 5 3 6 4 7 5 8 6 9 7 None 8 None 9
This method is space-efficient by avoiding copying the iterator multiple times. It is also fast due to how it lazily generates elements. Finally, as a cherry on top, you can look ahead an arbitrary number of elements.
In my case, I need a generator where I could queue back to generator the data I have just got via next() call.
The way I handle this problem, is to create a queue. In the implementation of the generator, I would first check the queue: if queue is not empty, the "yield" will return the values in queue, or otherwise the values in normal way.
import queue def gen1(n, q): i = 0 while True: if not q.empty(): yield q.get() else: yield i i = i + 1 if i >= n: if not q.empty(): yield q.get() break q = queue.Queue() f = gen1(2, q) i = next(f) print(i) i = next(f) print(i) q.put(i) # put back the value I have just got for following 'next' call i = next(f) print(i)
0 1 1
This concept is very useful when I was writing a parser, which needs to look the file line by line, if the line appears to belong to next phase of parsing, I could just queue back to the generator so that the next phase of code could parse it correctly without handling complex state.
An algorithm that works by "peeking" at the next element in a generator could equivalently be one that works by remembering the previous element, treating that element as the one to operate upon, and treating the "current" element as simply "peeked at".
Either way, what is really happening is that the algorithm considers overlapping pairs from the generator. The
itertools.tee recipe will work fine – and it is not hard to see that it is essentially a refactored version of Jonathan Hartley’s approach:
from itertools import tee # From https://docs.python.org/3/library/itertools.html#itertools.pairwise # In 3.10 and up, this is directly supplied by the `itertools` module. def pairwise(iterable): # pairwise('ABCDEFG') --> AB BC CD DE EF FG a, b = tee(iterable) next(b, None) return zip(a, b) def process(seq): for to_process, lookahead in pairwise(seq): # peek ahead if lookahead == "STOP": return # process items print(to_process)