how to efficiently exhaust an iterator in a oneliner?
Question:
If i have an iterator it
and want to exhaust it I can write:
for x in it:
pass
Is there a builtin or standard library call which allows me to do it in a one-liner? Of course i could do:
list(it)
which will build a list from the iterator and then discard it. But i consider that inefficient because of the list-building step. It’s of course trivial to write myself a helper function that does the empty for loop but i am curious if there is something else i am missing.
Answers:
From the itertools
recipes:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
You could use sum
:
sum(0 for _ in it)
or similarly, using reduce
:
reduce(lambda x, y: y, it)
object() in it
If you know the iterator will never produce a certain kind of object, you can also use that instead, e.g. None in it
or () in it
. The newly-created object()
works pretty much universally, because it’ll never test equal to anything else (barring shenanigans).
I’m not advocating this idiom; the for
loop in the question is in many ways the best solution. But if you’re looking for a creepily “elegant” answer in the sense that it does the minimum possible side-computation while still being a very neat one-liner (as opposed to e.g. any(False for _ in it)
) then this may be it.
The built-in all()
function should be extremely cheap and simple:
all(True for _ in it)
Edit: Fixed, thank you @hemflit !
Note that your suggestion can also be formulated as a one-liner:
for _ in it: pass
And I just made:
def exhaust(it):
for _ in it:
pass
It’s not as fast as the deque
solution (10% slower on my laptop), but I find it cleaner.
2022 update (bounty asks): There’s no "dedicated function" for it in the standard library, and deque(it, 0)
is still the most efficient. That’s why it’s used in itertools’s consume
recipe and more-itertools’s consume
function (click on [source] there).
Benchmark of the various proposals, iteration time per element, iterating itertools.repeat(None, 10**5)
(with CPython 3.10):
2.7 ns ± 0.1 ns consume_deque
6.5 ns ± 0.0 ns consume_loop
6.5 ns ± 0.0 ns consume_all_if_False
13.9 ns ± 0.3 ns consume_object_in
27.0 ns ± 0.1 ns consume_all_True
29.4 ns ± 0.3 ns consume_sum_0
44.8 ns ± 0.1 ns consume_reduce
The deque
one wins due to being C and having a fast path for maxlen == 0
which does nothing with the elements.
The simple loop gets second place, fastest with Python iteration. The other solutions previously proposed here waste more or less time by doing more or less work with each element. I added consume_all_if_False
to show how to do an all
/sum
efficiently: have an if False
clause so your generator doesn’t produce anything.
Benchmark code (Try it online!):
def consume_loop(it):
for _ in it:
pass
def consume_deque(it):
deque(it, 0)
def consume_object_in(it):
object() in it
def consume_all_True(it):
all(True for _ in it)
def consume_all_if_False(it):
all(_ for _ in it if False)
def consume_sum_0(it):
sum(0 for _ in it)
def consume_reduce(it):
reduce(lambda x, y: y, it)
funcs = [
consume_loop,
consume_deque,
consume_object_in,
consume_all_True,
consume_all_if_False,
consume_sum_0,
consume_reduce,
]
from timeit import default_timer as timer
from itertools import repeat
from collections import deque
from functools import reduce
from random import shuffle
from statistics import mean, stdev
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e9 for t in sorted(times[f])[:5]]
return f'{mean(ts):5.1f} ns ± {stdev(ts):3.1f} ns'
for _ in range(25):
shuffle(funcs)
for f in funcs:
n = 10**5
it = repeat(None, n)
t0 = timer()
f(it)
t1 = timer()
times[f].append((t1 - t0) / n)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)
If i have an iterator it
and want to exhaust it I can write:
for x in it:
pass
Is there a builtin or standard library call which allows me to do it in a one-liner? Of course i could do:
list(it)
which will build a list from the iterator and then discard it. But i consider that inefficient because of the list-building step. It’s of course trivial to write myself a helper function that does the empty for loop but i am curious if there is something else i am missing.
From the itertools
recipes:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
You could use sum
:
sum(0 for _ in it)
or similarly, using reduce
:
reduce(lambda x, y: y, it)
object() in it
If you know the iterator will never produce a certain kind of object, you can also use that instead, e.g. None in it
or () in it
. The newly-created object()
works pretty much universally, because it’ll never test equal to anything else (barring shenanigans).
I’m not advocating this idiom; the for
loop in the question is in many ways the best solution. But if you’re looking for a creepily “elegant” answer in the sense that it does the minimum possible side-computation while still being a very neat one-liner (as opposed to e.g. any(False for _ in it)
) then this may be it.
The built-in all()
function should be extremely cheap and simple:
all(True for _ in it)
Edit: Fixed, thank you @hemflit !
Note that your suggestion can also be formulated as a one-liner:
for _ in it: pass
And I just made:
def exhaust(it):
for _ in it:
pass
It’s not as fast as the deque
solution (10% slower on my laptop), but I find it cleaner.
2022 update (bounty asks): There’s no "dedicated function" for it in the standard library, and deque(it, 0)
is still the most efficient. That’s why it’s used in itertools’s consume
recipe and more-itertools’s consume
function (click on [source] there).
Benchmark of the various proposals, iteration time per element, iterating itertools.repeat(None, 10**5)
(with CPython 3.10):
2.7 ns ± 0.1 ns consume_deque
6.5 ns ± 0.0 ns consume_loop
6.5 ns ± 0.0 ns consume_all_if_False
13.9 ns ± 0.3 ns consume_object_in
27.0 ns ± 0.1 ns consume_all_True
29.4 ns ± 0.3 ns consume_sum_0
44.8 ns ± 0.1 ns consume_reduce
The deque
one wins due to being C and having a fast path for maxlen == 0
which does nothing with the elements.
The simple loop gets second place, fastest with Python iteration. The other solutions previously proposed here waste more or less time by doing more or less work with each element. I added consume_all_if_False
to show how to do an all
/sum
efficiently: have an if False
clause so your generator doesn’t produce anything.
Benchmark code (Try it online!):
def consume_loop(it):
for _ in it:
pass
def consume_deque(it):
deque(it, 0)
def consume_object_in(it):
object() in it
def consume_all_True(it):
all(True for _ in it)
def consume_all_if_False(it):
all(_ for _ in it if False)
def consume_sum_0(it):
sum(0 for _ in it)
def consume_reduce(it):
reduce(lambda x, y: y, it)
funcs = [
consume_loop,
consume_deque,
consume_object_in,
consume_all_True,
consume_all_if_False,
consume_sum_0,
consume_reduce,
]
from timeit import default_timer as timer
from itertools import repeat
from collections import deque
from functools import reduce
from random import shuffle
from statistics import mean, stdev
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e9 for t in sorted(times[f])[:5]]
return f'{mean(ts):5.1f} ns ± {stdev(ts):3.1f} ns'
for _ in range(25):
shuffle(funcs)
for f in funcs:
n = 10**5
it = repeat(None, n)
t0 = timer()
f(it)
t1 = timer()
times[f].append((t1 - t0) / n)
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__)