Python: generator expression vs. yield
Question:
In Python, is there any difference between creating a generator object through a generator expression versus using the yield statement?
Using yield:
def Generator(x, y):
for i in xrange(x):
for j in xrange(y):
yield(i, j)
Using generator expression:
def Generator(x, y):
return ((i, j) for i in xrange(x) for j in xrange(y))
Both functions return generator objects, which produce tuples, e.g. (0,0), (0,1) etc.
Any advantages of one or the other? Thoughts?
Answers:
Using yield
is nice if the expression is more complicated than just nested loops. Among other things you can return a special first or special last value. Consider:
def Generator(x):
for i in xrange(x):
yield(i)
yield(None)
In this example, not really. But yield
can be used for more complex constructs – for example it can accept values from the caller as well and modify the flow as a result. Read PEP 342 for more details (it’s an interesting technique worth knowing).
Anyway, the best advice is use whatever is clearer for your needs.
P.S. Here’s a simple coroutine example from Dave Beazley:
def grep(pattern):
print "Looking for %s" % pattern
while True:
line = (yield)
if pattern in line:
print line,
# Example use
if __name__ == '__main__':
g = grep("python")
g.next()
g.send("Yeah, but no, but yeah, but no")
g.send("A series of tubes")
g.send("python generators rock!")
When thinking about iterators, the itertools
module:
… standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.
For performance, consider itertools.product(*iterables[, repeat])
Cartesian product of input iterables.
Equivalent to nested for-loops in a generator expression. For example, product(A, B)
returns the same as ((x,y) for x in A for y in B)
.
>>> import itertools
>>> def gen(x,y):
... return itertools.product(xrange(x),xrange(y))
...
>>> [t for t in gen(3,2)]
[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]
>>>
There is no difference for the kind of simple loops that you can fit into a generator expression. However yield can be used to create generators that do much more complex processing. Here is a simple example for generating the fibonacci sequence:
>>> def fibgen():
... a = b = 1
... while True:
... yield a
... a, b = b, a+b
>>> list(itertools.takewhile((lambda x: x<100), fibgen()))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
There are only slight differences in the two. You can use the dis
module to examine this sort of thing for yourself.
Edit: My first version decompiled the generator expression created at module-scope in the interactive prompt. That’s slightly different from the OP’s version with it used inside a function. I’ve modified this to match the actual case in the question.
As you can see below, the “yield” generator (first case) has three extra instructions in the setup, but from the first FOR_ITER
they differ in only one respect: the “yield” approach uses a LOAD_FAST
in place of a LOAD_DEREF
inside the loop. The LOAD_DEREF
is “rather slower” than LOAD_FAST
, so it makes the “yield” version slightly faster than the generator expression for large enough values of x
(the outer loop) because the value of y
is loaded slightly faster on each pass. For smaller values of x
it would be slightly slower because of the extra overhead of the setup code.
It might also be worth pointing out that the generator expression would usually be used inline in the code, rather than wrapping it with the function like that. That would remove a bit of the setup overhead and keep the generator expression slightly faster for smaller loop values even if LOAD_FAST
gave the “yield” version an advantage otherwise.
In neither case would the performance difference be enough to justify deciding between one or the other. Readability counts far more, so use whichever feels most readable for the situation at hand.
>>> def Generator(x, y):
... for i in xrange(x):
... for j in xrange(y):
... yield(i, j)
...
>>> dis.dis(Generator)
2 0 SETUP_LOOP 54 (to 57)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_FAST 0 (x)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 40 (to 56)
16 STORE_FAST 2 (i)
3 19 SETUP_LOOP 31 (to 53)
22 LOAD_GLOBAL 0 (xrange)
25 LOAD_FAST 1 (y)
28 CALL_FUNCTION 1
31 GET_ITER
>> 32 FOR_ITER 17 (to 52)
35 STORE_FAST 3 (j)
4 38 LOAD_FAST 2 (i)
41 LOAD_FAST 3 (j)
44 BUILD_TUPLE 2
47 YIELD_VALUE
48 POP_TOP
49 JUMP_ABSOLUTE 32
>> 52 POP_BLOCK
>> 53 JUMP_ABSOLUTE 13
>> 56 POP_BLOCK
>> 57 LOAD_CONST 0 (None)
60 RETURN_VALUE
>>> def Generator_expr(x, y):
... return ((i, j) for i in xrange(x) for j in xrange(y))
...
>>> dis.dis(Generator_expr.func_code.co_consts[1])
2 0 SETUP_LOOP 47 (to 50)
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 40 (to 49)
9 STORE_FAST 1 (i)
12 SETUP_LOOP 31 (to 46)
15 LOAD_GLOBAL 0 (xrange)
18 LOAD_DEREF 0 (y)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 17 (to 45)
28 STORE_FAST 2 (j)
31 LOAD_FAST 1 (i)
34 LOAD_FAST 2 (j)
37 BUILD_TUPLE 2
40 YIELD_VALUE
41 POP_TOP
42 JUMP_ABSOLUTE 25
>> 45 POP_BLOCK
>> 46 JUMP_ABSOLUTE 6
>> 49 POP_BLOCK
>> 50 LOAD_CONST 0 (None)
53 RETURN_VALUE
In usage, note a distinction between a generator object vs a generator function.
A generator object is use-once-only, in contrast to a generator function, which can be reused each time you call it again, because it returns a fresh generator object.
Generator expressions are in practice usually used “raw”, without wrapping them in a function, and they return a generator object.
E.g.:
def range_10_gen_func():
x = 0
while x < 10:
yield x
x = x + 1
print(list(range_10_gen_func()))
print(list(range_10_gen_func()))
print(list(range_10_gen_func()))
which outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Compare with a slightly different usage:
range_10_gen = range_10_gen_func()
print(list(range_10_gen))
print(list(range_10_gen))
print(list(range_10_gen))
which outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]
And compare with a generator expression:
range_10_gen_expr = (x for x in range(10))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))
which also outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]
There is a difference that could be important in some contexts that hasn’t been pointed out yet. Using yield
prevents you from using return
for something else than implicitly raising StopIteration (and coroutines related stuff).
This means this code is ill-formed (and feeding it to an interpreter will give you an AttributeError
):
class Tea:
"""With a cloud of milk, please"""
def __init__(self, temperature):
self.temperature = temperature
def mary_poppins_purse(tea_time=False):
"""I would like to make one thing clear: I never explain anything."""
if tea_time:
return Tea(355)
else:
for item in ['lamp', 'mirror', 'coat rack', 'tape measure', 'ficus']:
yield item
print(mary_poppins_purse(True).temperature)
On the other hand, this code works like a charm:
class Tea:
"""With a cloud of milk, please"""
def __init__(self, temperature):
self.temperature = temperature
def mary_poppins_purse(tea_time=False):
"""I would like to make one thing clear: I never explain anything."""
if tea_time:
return Tea(355)
else:
return (item for item in ['lamp', 'mirror', 'coat rack',
'tape measure', 'ficus'])
print(mary_poppins_purse(True).temperature)
Yes there is a difference.
For the generator expression (x for var in expr)
, iter(expr)
is called when the expression is created.
When using def
and yield
to create a generator, as in:
def my_generator():
for var in expr:
yield x
g = my_generator()
iter(expr)
is not yet called. It will be called only when iterating on g
(and might not be called at all).
Taking this iterator as an example:
from __future__ import print_function
class CountDown(object):
def __init__(self, n):
self.n = n
def __iter__(self):
print("ITER")
return self
def __next__(self):
if self.n == 0:
raise StopIteration()
self.n -= 1
return self.n
next = __next__ # for python2
This code:
g1 = (i ** 2 for i in CountDown(3)) # immediately prints "ITER"
print("Go!")
for x in g1:
print(x)
while:
def my_generator():
for i in CountDown(3):
yield i ** 2
g2 = my_generator()
print("Go!")
for x in g2: # "ITER" is only printed here
print(x)
Since most iterators do not do a lot of stuff in __iter__
, it is easy to miss this behavior. A real world example would be Django’s QuerySet
, which fetch data in __iter__
and data = (f(x) for x in qs)
might take a lot of time, while def g(): for x in qs: yield f(x)
followed by data=g()
would return immediately.
For more info and the formal definition refer to PEP 289 — Generator Expressions.
In Python, is there any difference between creating a generator object through a generator expression versus using the yield statement?
Using yield:
def Generator(x, y):
for i in xrange(x):
for j in xrange(y):
yield(i, j)
Using generator expression:
def Generator(x, y):
return ((i, j) for i in xrange(x) for j in xrange(y))
Both functions return generator objects, which produce tuples, e.g. (0,0), (0,1) etc.
Any advantages of one or the other? Thoughts?
Using yield
is nice if the expression is more complicated than just nested loops. Among other things you can return a special first or special last value. Consider:
def Generator(x):
for i in xrange(x):
yield(i)
yield(None)
In this example, not really. But yield
can be used for more complex constructs – for example it can accept values from the caller as well and modify the flow as a result. Read PEP 342 for more details (it’s an interesting technique worth knowing).
Anyway, the best advice is use whatever is clearer for your needs.
P.S. Here’s a simple coroutine example from Dave Beazley:
def grep(pattern):
print "Looking for %s" % pattern
while True:
line = (yield)
if pattern in line:
print line,
# Example use
if __name__ == '__main__':
g = grep("python")
g.next()
g.send("Yeah, but no, but yeah, but no")
g.send("A series of tubes")
g.send("python generators rock!")
When thinking about iterators, the itertools
module:
… standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.
For performance, consider itertools.product(*iterables[, repeat])
Cartesian product of input iterables.
Equivalent to nested for-loops in a generator expression. For example,
product(A, B)
returns the same as((x,y) for x in A for y in B)
.
>>> import itertools
>>> def gen(x,y):
... return itertools.product(xrange(x),xrange(y))
...
>>> [t for t in gen(3,2)]
[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]
>>>
There is no difference for the kind of simple loops that you can fit into a generator expression. However yield can be used to create generators that do much more complex processing. Here is a simple example for generating the fibonacci sequence:
>>> def fibgen():
... a = b = 1
... while True:
... yield a
... a, b = b, a+b
>>> list(itertools.takewhile((lambda x: x<100), fibgen()))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
There are only slight differences in the two. You can use the dis
module to examine this sort of thing for yourself.
Edit: My first version decompiled the generator expression created at module-scope in the interactive prompt. That’s slightly different from the OP’s version with it used inside a function. I’ve modified this to match the actual case in the question.
As you can see below, the “yield” generator (first case) has three extra instructions in the setup, but from the first FOR_ITER
they differ in only one respect: the “yield” approach uses a LOAD_FAST
in place of a LOAD_DEREF
inside the loop. The LOAD_DEREF
is “rather slower” than LOAD_FAST
, so it makes the “yield” version slightly faster than the generator expression for large enough values of x
(the outer loop) because the value of y
is loaded slightly faster on each pass. For smaller values of x
it would be slightly slower because of the extra overhead of the setup code.
It might also be worth pointing out that the generator expression would usually be used inline in the code, rather than wrapping it with the function like that. That would remove a bit of the setup overhead and keep the generator expression slightly faster for smaller loop values even if LOAD_FAST
gave the “yield” version an advantage otherwise.
In neither case would the performance difference be enough to justify deciding between one or the other. Readability counts far more, so use whichever feels most readable for the situation at hand.
>>> def Generator(x, y):
... for i in xrange(x):
... for j in xrange(y):
... yield(i, j)
...
>>> dis.dis(Generator)
2 0 SETUP_LOOP 54 (to 57)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_FAST 0 (x)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 40 (to 56)
16 STORE_FAST 2 (i)
3 19 SETUP_LOOP 31 (to 53)
22 LOAD_GLOBAL 0 (xrange)
25 LOAD_FAST 1 (y)
28 CALL_FUNCTION 1
31 GET_ITER
>> 32 FOR_ITER 17 (to 52)
35 STORE_FAST 3 (j)
4 38 LOAD_FAST 2 (i)
41 LOAD_FAST 3 (j)
44 BUILD_TUPLE 2
47 YIELD_VALUE
48 POP_TOP
49 JUMP_ABSOLUTE 32
>> 52 POP_BLOCK
>> 53 JUMP_ABSOLUTE 13
>> 56 POP_BLOCK
>> 57 LOAD_CONST 0 (None)
60 RETURN_VALUE
>>> def Generator_expr(x, y):
... return ((i, j) for i in xrange(x) for j in xrange(y))
...
>>> dis.dis(Generator_expr.func_code.co_consts[1])
2 0 SETUP_LOOP 47 (to 50)
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 40 (to 49)
9 STORE_FAST 1 (i)
12 SETUP_LOOP 31 (to 46)
15 LOAD_GLOBAL 0 (xrange)
18 LOAD_DEREF 0 (y)
21 CALL_FUNCTION 1
24 GET_ITER
>> 25 FOR_ITER 17 (to 45)
28 STORE_FAST 2 (j)
31 LOAD_FAST 1 (i)
34 LOAD_FAST 2 (j)
37 BUILD_TUPLE 2
40 YIELD_VALUE
41 POP_TOP
42 JUMP_ABSOLUTE 25
>> 45 POP_BLOCK
>> 46 JUMP_ABSOLUTE 6
>> 49 POP_BLOCK
>> 50 LOAD_CONST 0 (None)
53 RETURN_VALUE
In usage, note a distinction between a generator object vs a generator function.
A generator object is use-once-only, in contrast to a generator function, which can be reused each time you call it again, because it returns a fresh generator object.
Generator expressions are in practice usually used “raw”, without wrapping them in a function, and they return a generator object.
E.g.:
def range_10_gen_func():
x = 0
while x < 10:
yield x
x = x + 1
print(list(range_10_gen_func()))
print(list(range_10_gen_func()))
print(list(range_10_gen_func()))
which outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Compare with a slightly different usage:
range_10_gen = range_10_gen_func()
print(list(range_10_gen))
print(list(range_10_gen))
print(list(range_10_gen))
which outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]
And compare with a generator expression:
range_10_gen_expr = (x for x in range(10))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))
print(list(range_10_gen_expr))
which also outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
[]
There is a difference that could be important in some contexts that hasn’t been pointed out yet. Using yield
prevents you from using return
for something else than implicitly raising StopIteration (and coroutines related stuff).
This means this code is ill-formed (and feeding it to an interpreter will give you an AttributeError
):
class Tea:
"""With a cloud of milk, please"""
def __init__(self, temperature):
self.temperature = temperature
def mary_poppins_purse(tea_time=False):
"""I would like to make one thing clear: I never explain anything."""
if tea_time:
return Tea(355)
else:
for item in ['lamp', 'mirror', 'coat rack', 'tape measure', 'ficus']:
yield item
print(mary_poppins_purse(True).temperature)
On the other hand, this code works like a charm:
class Tea:
"""With a cloud of milk, please"""
def __init__(self, temperature):
self.temperature = temperature
def mary_poppins_purse(tea_time=False):
"""I would like to make one thing clear: I never explain anything."""
if tea_time:
return Tea(355)
else:
return (item for item in ['lamp', 'mirror', 'coat rack',
'tape measure', 'ficus'])
print(mary_poppins_purse(True).temperature)
Yes there is a difference.
For the generator expression (x for var in expr)
, iter(expr)
is called when the expression is created.
When using def
and yield
to create a generator, as in:
def my_generator():
for var in expr:
yield x
g = my_generator()
iter(expr)
is not yet called. It will be called only when iterating on g
(and might not be called at all).
Taking this iterator as an example:
from __future__ import print_function
class CountDown(object):
def __init__(self, n):
self.n = n
def __iter__(self):
print("ITER")
return self
def __next__(self):
if self.n == 0:
raise StopIteration()
self.n -= 1
return self.n
next = __next__ # for python2
This code:
g1 = (i ** 2 for i in CountDown(3)) # immediately prints "ITER"
print("Go!")
for x in g1:
print(x)
while:
def my_generator():
for i in CountDown(3):
yield i ** 2
g2 = my_generator()
print("Go!")
for x in g2: # "ITER" is only printed here
print(x)
Since most iterators do not do a lot of stuff in __iter__
, it is easy to miss this behavior. A real world example would be Django’s QuerySet
, which fetch data in __iter__
and data = (f(x) for x in qs)
might take a lot of time, while def g(): for x in qs: yield f(x)
followed by data=g()
would return immediately.
For more info and the formal definition refer to PEP 289 — Generator Expressions.