ProcessPoolExecutor cannot execute my own functions but executing print works

Question:

Code:

if __name__ == "__main__":
    p = ProcessPoolExecutor()
    p.submit(lambda x: print(x), "something")  # doesn't work
    p.submit(print, "something")  # works fine
    time.sleep(0.5)

Why does this make sense?

Asked By: SpaceMonkey

||

Answers:

ProcessPoolExecutor wants to pickle the function, but since a lambda has no name, it cannot be found to get pickled.

For example:

from pickle import dumps

def fun(x):
    print(x)

lmb = lambda x: print(x)

dumps(fun)  # succeeds
dumps(lmb)  # fails

So, this will work just fine:

import time
from concurrent.futures import ThreadPoolExecutor


def fun(x):
    print(x)


if __name__ == "__main__":
    p = ThreadPoolExecutor()
    lmb = lambda x: print(x)
    p.submit(lmb, "lambda")  # works fine
    p.submit(fun, "local function")  # works fine
    p.submit(print, "built-in function")  # works fine
    time.sleep(0.5)

But if you replace the ThreadPoolExecutor() with the ProcessPoolExecutor(), which will need to pickle the function, the lambda stops working.

from concurrent.futures import ProcessPoolExecutor


if __name__ == "__main__":
    p = ProcessPoolExecutor()
    lmb = lambda x: print(x)
    future = p.submit(lmb, "lambda")  # doesn't work
    print(future.result())

This shows that the problem does indeed occur when pickling, and it also makes clear why:

_pickle.PicklingError: Can't pickle <function <lambda> at 0x00000294E66B3E20>: attribute lookup <lambda> on __main__ failed

__main__ is the main process, which does not have the lambda in its namespace, because a lambda itself is nameless. Assigning it to a variable like lmb doesn’t change that, since the lambda is dereferenced from the variable. The other two functions inherently have a name in the namespace of __main__, and can be pickled.

Note that __main__ is the same name you’d test for in:

if __name__ == "__main__":
Answered By: Grismar

If you will check the result of the future you will see the relevant error message:

>>> from concurrent.futures import ProcessPoolExecutor
>>> p = ProcessPoolExecutor()
>>> p.submit(lambda x: print(x), "something").result()
...
PicklingError: Can't pickle <function <lambda> at 0x113eec5e0>: attribute lookup <lambda> on __main__ failed

So this error message is pretty self-explanatory if you’re familiar with how a process pool executor works. But if you’re not, then there might be some more explanation needed: the process will try to access the worker function by name, but since lambdas are "anonymous" functions they don’t have a valid name to look up in the module namespace.

>>> (lambda x: print(x)).__name__
'<lambda>'

As a workaround, you could use a pathos multiprocessing pool which uses dill, a more powerful serialization library than pickle. Unlike pickle, dill is able to serialize lambdas. The ProcessPool interface in pathos is a little different to stdlib multiprocessing ProcessPoolExecutor, but the closest analogy to your simple submit usage would be a pipe:

>>> from pathos.multiprocessing import ProcessPool
>>> p = ProcessPool()
>>> p.pipe(lambda x: print(x), "something")
something

If you’re curious how dill is able to dump lambdas, turn on the tracing and check out what it actually does with this snippet:

import dill.detect
dill.detect.trace(True)
dill.dumps(lambda x: print(x))
Answered By: wim
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.