Weird lambda behaviour in loops
Question:
I stumbled upon a behaviour in python that I have a hard time understanding. This is the proof-of-concept code:
from functools import partial
if __name__ == '__main__':
sequence = ['foo', 'bar', 'spam']
loop_one = lambda seq: [lambda: el for el in seq]
no_op = lambda x: x
loop_two = lambda seq: [partial(no_op, el) for el in seq]
for func in (loop_one, loop_two):
print [f() for f in func(sequence)]
The output of the above is:
['spam', 'spam', 'spam']
['foo', 'bar', 'spam']
The behaviour of loop_one
is surprising to me as I would expect it to behave as loop_two
:el
is an immutable value (a string) that changes at each loop, but lambda
seems to store a pointer to the “looping variable”, like if the loop would recycle the same memory address for each element of the sequence.
The above behaviour is the same with full-blown functions with a for loop in them (so it is not a list-comprehension syntax).
But wait: there is more… and more puzzling!
The following script works like loop_one
:
b = []
for foo in ("foo", "bar"):
b.append(lambda: foo)
print [a() for a in b]
(output: ['bar', 'bar']
)
But watch what happens when one substitute the variable name foo
with a
:
b = []
for a in ("foo", "bar"):
b.append(lambda: a)
print [a() for a in b]
(output: [<function <lambda> at 0x25cce60>, <function <lambda> at 0x25cced8>]
)
Any idea of what is happening here? I suspect there must be some gotcha related to the underlying C implementation of my interpreter, but I haven’t anything else (Jthon, PyPy or similar) to test if this behaviour is consistent across different implementations.
Answers:
The variables (foo
in the following example) is binded not when the lambda is created, but when the lambda is called.
>>> b = []
>>> for foo in ("foo", "bar"):
... b.append(lambda: foo)
...
>>> foo = "spam"
>>> print [a() for a in b]
['spam', 'spam']
>>> b = []
>>> for foo in ("foo", "bar"):
... b.append(lambda foo=foo: foo)
...
>>> print [a() for a in b]
['foo', 'bar']
The function lambda: el
used in loop_one
refers to a variable el
which is not defined in the local scope. Therefore, Python looks for it next in the enclosing scope of the other lambda
:
lambda seq: [lambda: el for el in seq]
in accordance with the so-called LEGB rule.
By the time lambda: el
is called, this enclosing lambda has (of course) already been called and the list comprehension has been evaluated. The el
used in the list comprehension is a local variable in this enclosing lambda. Its value is the one returned when Python looks for the value of el
in lambda: el
. That value for el
is the same for all the different lambda: el
functions in the list comprehension: it is the last value assigned to el
in the for el in seq
loop. Thus, el
is always 'spam'
, the last value in seq
.
You’ve already found one workaround, to use a closure such as your loop_two
. Another way is to define el
as a local variable with a default value:
loop_one = lambda seq: [lambda el=el: el for el in seq]
I stumbled upon a behaviour in python that I have a hard time understanding. This is the proof-of-concept code:
from functools import partial
if __name__ == '__main__':
sequence = ['foo', 'bar', 'spam']
loop_one = lambda seq: [lambda: el for el in seq]
no_op = lambda x: x
loop_two = lambda seq: [partial(no_op, el) for el in seq]
for func in (loop_one, loop_two):
print [f() for f in func(sequence)]
The output of the above is:
['spam', 'spam', 'spam']
['foo', 'bar', 'spam']
The behaviour of loop_one
is surprising to me as I would expect it to behave as loop_two
:el
is an immutable value (a string) that changes at each loop, but lambda
seems to store a pointer to the “looping variable”, like if the loop would recycle the same memory address for each element of the sequence.
The above behaviour is the same with full-blown functions with a for loop in them (so it is not a list-comprehension syntax).
But wait: there is more… and more puzzling!
The following script works like loop_one
:
b = []
for foo in ("foo", "bar"):
b.append(lambda: foo)
print [a() for a in b]
(output: ['bar', 'bar']
)
But watch what happens when one substitute the variable name foo
with a
:
b = []
for a in ("foo", "bar"):
b.append(lambda: a)
print [a() for a in b]
(output: [<function <lambda> at 0x25cce60>, <function <lambda> at 0x25cced8>]
)
Any idea of what is happening here? I suspect there must be some gotcha related to the underlying C implementation of my interpreter, but I haven’t anything else (Jthon, PyPy or similar) to test if this behaviour is consistent across different implementations.
The variables (foo
in the following example) is binded not when the lambda is created, but when the lambda is called.
>>> b = []
>>> for foo in ("foo", "bar"):
... b.append(lambda: foo)
...
>>> foo = "spam"
>>> print [a() for a in b]
['spam', 'spam']
>>> b = []
>>> for foo in ("foo", "bar"):
... b.append(lambda foo=foo: foo)
...
>>> print [a() for a in b]
['foo', 'bar']
The function lambda: el
used in loop_one
refers to a variable el
which is not defined in the local scope. Therefore, Python looks for it next in the enclosing scope of the other lambda
:
lambda seq: [lambda: el for el in seq]
in accordance with the so-called LEGB rule.
By the time lambda: el
is called, this enclosing lambda has (of course) already been called and the list comprehension has been evaluated. The el
used in the list comprehension is a local variable in this enclosing lambda. Its value is the one returned when Python looks for the value of el
in lambda: el
. That value for el
is the same for all the different lambda: el
functions in the list comprehension: it is the last value assigned to el
in the for el in seq
loop. Thus, el
is always 'spam'
, the last value in seq
.
You’ve already found one workaround, to use a closure such as your loop_two
. Another way is to define el
as a local variable with a default value:
loop_one = lambda seq: [lambda el=el: el for el in seq]