ProcessPoolExecutor cannot execute my own functions but executing print works
Question:
Code:
if __name__ == "__main__":
p = ProcessPoolExecutor()
p.submit(lambda x: print(x), "something") # doesn't work
p.submit(print, "something") # works fine
time.sleep(0.5)
Why does this make sense?
Answers:
ProcessPoolExecutor
wants to pickle the function, but since a lambda has no name, it cannot be found to get pickled.
For example:
from pickle import dumps
def fun(x):
print(x)
lmb = lambda x: print(x)
dumps(fun) # succeeds
dumps(lmb) # fails
So, this will work just fine:
import time
from concurrent.futures import ThreadPoolExecutor
def fun(x):
print(x)
if __name__ == "__main__":
p = ThreadPoolExecutor()
lmb = lambda x: print(x)
p.submit(lmb, "lambda") # works fine
p.submit(fun, "local function") # works fine
p.submit(print, "built-in function") # works fine
time.sleep(0.5)
But if you replace the ThreadPoolExecutor()
with the ProcessPoolExecutor()
, which will need to pickle the function, the lambda stops working.
from concurrent.futures import ProcessPoolExecutor
if __name__ == "__main__":
p = ProcessPoolExecutor()
lmb = lambda x: print(x)
future = p.submit(lmb, "lambda") # doesn't work
print(future.result())
This shows that the problem does indeed occur when pickling, and it also makes clear why:
_pickle.PicklingError: Can't pickle <function <lambda> at 0x00000294E66B3E20>: attribute lookup <lambda> on __main__ failed
__main__
is the main process, which does not have the lambda in its namespace, because a lambda itself is nameless. Assigning it to a variable like lmb
doesn’t change that, since the lambda is dereferenced from the variable. The other two functions inherently have a name in the namespace of __main__
, and can be pickled.
Note that __main__
is the same name you’d test for in:
if __name__ == "__main__":
If you will check the result of the future you will see the relevant error message:
>>> from concurrent.futures import ProcessPoolExecutor
>>> p = ProcessPoolExecutor()
>>> p.submit(lambda x: print(x), "something").result()
...
PicklingError: Can't pickle <function <lambda> at 0x113eec5e0>: attribute lookup <lambda> on __main__ failed
So this error message is pretty self-explanatory if you’re familiar with how a process pool executor works. But if you’re not, then there might be some more explanation needed: the process will try to access the worker function by name, but since lambdas are "anonymous" functions they don’t have a valid name to look up in the module namespace.
>>> (lambda x: print(x)).__name__
'<lambda>'
As a workaround, you could use a pathos multiprocessing pool which uses dill, a more powerful serialization library than pickle. Unlike pickle, dill is able to serialize lambdas. The ProcessPool
interface in pathos is a little different to stdlib multiprocessing ProcessPoolExecutor
, but the closest analogy to your simple submit
usage would be a pipe
:
>>> from pathos.multiprocessing import ProcessPool
>>> p = ProcessPool()
>>> p.pipe(lambda x: print(x), "something")
something
If you’re curious how dill is able to dump lambdas, turn on the tracing and check out what it actually does with this snippet:
import dill.detect
dill.detect.trace(True)
dill.dumps(lambda x: print(x))
Code:
if __name__ == "__main__":
p = ProcessPoolExecutor()
p.submit(lambda x: print(x), "something") # doesn't work
p.submit(print, "something") # works fine
time.sleep(0.5)
Why does this make sense?
ProcessPoolExecutor
wants to pickle the function, but since a lambda has no name, it cannot be found to get pickled.
For example:
from pickle import dumps
def fun(x):
print(x)
lmb = lambda x: print(x)
dumps(fun) # succeeds
dumps(lmb) # fails
So, this will work just fine:
import time
from concurrent.futures import ThreadPoolExecutor
def fun(x):
print(x)
if __name__ == "__main__":
p = ThreadPoolExecutor()
lmb = lambda x: print(x)
p.submit(lmb, "lambda") # works fine
p.submit(fun, "local function") # works fine
p.submit(print, "built-in function") # works fine
time.sleep(0.5)
But if you replace the ThreadPoolExecutor()
with the ProcessPoolExecutor()
, which will need to pickle the function, the lambda stops working.
from concurrent.futures import ProcessPoolExecutor
if __name__ == "__main__":
p = ProcessPoolExecutor()
lmb = lambda x: print(x)
future = p.submit(lmb, "lambda") # doesn't work
print(future.result())
This shows that the problem does indeed occur when pickling, and it also makes clear why:
_pickle.PicklingError: Can't pickle <function <lambda> at 0x00000294E66B3E20>: attribute lookup <lambda> on __main__ failed
__main__
is the main process, which does not have the lambda in its namespace, because a lambda itself is nameless. Assigning it to a variable like lmb
doesn’t change that, since the lambda is dereferenced from the variable. The other two functions inherently have a name in the namespace of __main__
, and can be pickled.
Note that __main__
is the same name you’d test for in:
if __name__ == "__main__":
If you will check the result of the future you will see the relevant error message:
>>> from concurrent.futures import ProcessPoolExecutor
>>> p = ProcessPoolExecutor()
>>> p.submit(lambda x: print(x), "something").result()
...
PicklingError: Can't pickle <function <lambda> at 0x113eec5e0>: attribute lookup <lambda> on __main__ failed
So this error message is pretty self-explanatory if you’re familiar with how a process pool executor works. But if you’re not, then there might be some more explanation needed: the process will try to access the worker function by name, but since lambdas are "anonymous" functions they don’t have a valid name to look up in the module namespace.
>>> (lambda x: print(x)).__name__
'<lambda>'
As a workaround, you could use a pathos multiprocessing pool which uses dill, a more powerful serialization library than pickle. Unlike pickle, dill is able to serialize lambdas. The ProcessPool
interface in pathos is a little different to stdlib multiprocessing ProcessPoolExecutor
, but the closest analogy to your simple submit
usage would be a pipe
:
>>> from pathos.multiprocessing import ProcessPool
>>> p = ProcessPool()
>>> p.pipe(lambda x: print(x), "something")
something
If you’re curious how dill is able to dump lambdas, turn on the tracing and check out what it actually does with this snippet:
import dill.detect
dill.detect.trace(True)
dill.dumps(lambda x: print(x))