Replace pickle in Python multiprocessing lib

Question:

I need to execute the code below (simplified version of my real code base in Python 3.5):

import multiprocessing
def forever(do_something=None):
    while True:
        do_something()

p = multiprocessing.Process(target=forever, args=(lambda: print("do  something"),))
p.start()

In order to create the new process Python need to pickle the function and the lambda passed as target.
Unofrtunately pickle cannot serialize lambdas and the output is like this:

_pickle.PicklingError: Can't pickle <function <lambda> at 0x00C0D4B0>: attribute lookup <lambda> on __main__ failed

I discoverd cloudpickle which can serialize and deserialize lambdas and closures, using the same interface of pickle.

How can I force the Python multiprocessing module to use cloudpickle instead of pickle?

Clearly hacking the code of the standard lib multiprocessing is not an option!

Thanks

Charlie

Asked By: Charlie

||

Answers:

Try multiprocess. It’s a fork of multiprocessing that uses the dill serializer instead of pickle — there are no other changes in the fork.

I’m the author. I encountered the same problem as you several years ago, and ultimately I decided that that hacking the standard library was my only choice, as some of the pickle code in multiprocessing is in C++.

>>> import multiprocess as mp
>>> p = mp.Pool()
>>> p.map(lambda x:x**2, range(4))
[0, 1, 4, 9]
>>> 
Answered By: Mike McKerns

If you’re willing to do a little monkeypatching, a quick fix is to sub out the pickle.Pickler:

import pickle
import cloudpickle
pickle.Pickler = cloudpickle.Pickler

or, in more recent versions of Python where _pickle.Pickle is pulled in,

from multiprocessing import reduction
import cloudpickle
reduction.ForkingPickler = cloudpickle.Pickler

Just make sure to do this before importing multiprocessing. Here’s a full example:

import pickle
import cloudpickle
pickle.Pickler = cloudpickle.Pickler

import multiprocessing as mp
mp.set_start_method('spawn', True)

def procprint(f):
    print(f())

if __name__ == '__main__':
    p = mp.Process(target=procprint, args=(lambda: "hello",))
    p.start()
    p.join()

As an aside, you won’t need to do any of this if your start method is fork, since with forking nothing needs to be pickled in the first place.

Answered By: Andy Jones

I was standing in front of the same problem. So I made a small module which enables pythons mp to eat lambdas.

In case you have a lot different unpickleable things I would also recommend to use dill or cloudpickle.

https://github.com/cloasdata/lambdser

pip install lambdser

Answered By: seimen

I had a similar problem of having to send data to the workers that can be cloudpickled but not normal-pickled.
But I wanted the multiprocessing to work with the normal pickle module for various reasons. I used this pattern:

class FunctionWrapper:

    def __init__(self, fn):
        self.fn_ser = cloudpickle.dumps(fn)

    def __call__(self):
        fn = cloudpickle.loads(self.fn_ser)
        return fn()

then you can call your lambda or whatever is causing the problem like this:

p = multiprocessing.Process(target=forever, args=FunctionWrapper(lambda: print("do  something"),))

The point is that the ‘meaningful’ serialization is happening outside the multiprocessing module with whatever library you want. The pickle in multiprocessing only sees a plain object with some string attributes.

Answered By: julaine