Pool within a Class in Python

Question:

I would like to use Pool within a class, but there seems to be a problem. My code is long, I created a small-demo variant to illustrated the problem. It would be great if you can give me a variant of the code below that works.

from multiprocessing import Pool

class SeriesInstance(object):
    def __init__(self):
        self.numbers = [1,2,3]
    def F(self, x):
        return x * x
    def run(self):
        p = Pool()
        print p.map(self.F, self.numbers)


ins = SeriesInstance()
ins.run()

Outputs:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

And then hangs.

Asked By: user58925

||

Answers:

It looks like because of the way the function gets passed to the worker threads (pickling) you can’t use instance methods unfortunately. My first thought was to use lambdas, but it turns out the built in pickler can’t serialize those either. The solution, sadly, is just to use a function in the global namespace. As suggested in other answers, you can use static methods and pass self to make it look more like an instance method.

from multiprocessing import Pool
from itertools import repeat

class SeriesInstance(object):
    def __init__(self):
        self.numbers = [1,2,3]

    def run(self):
        p = Pool()
        squares = p.map(self.F, self.numbers)
        multiples = p.starmap(self.G, zip(repeat(self), [2, 5, 10]))
        return (squares, multiples)

    @staticmethod
    def F(x):
        return x * x

    @staticmethod
    def G(self, m):
        return [m *n for n in self.numbers]

if __name__ == '__main__':
    print(SeriesInstance().run())
Answered By: Alex Sherman

You can also use multiprocessing with static functions in the class.

Answered By: stardust

You have an error, because pickle can’t serialize instancemethod. So you should use this tiny workaround:

from itertools import repeat
from multiprocessing import Pool


class SeriesInstance:
    def __init__(self):
        self.numbers = [1, 2, 3]

    def F(self, x):
        return x * x

    def run(self):
        p = Pool()
        print(list(p.starmap(SeriesInstance.F, zip(repeat(self), self.numbers))))


if __name__ == '__main__':
    SeriesInstance().run()

There are many posts on stackoverflow about this issue happening for varying reasons. In my case, I was trying to call pool.starmap from inside of a class on another function in the class. Making it a staticmethod or having a function on the outside of the class call it didn’t work and gave the same error. A class instance just can’t be pickled so we need to create the instance after we start the multiprocessing.

What I ended up doing that worked for me was to separate my class into two classes. Something like this:

from multiprocessing import Pool

class B:
    ...
    def process_feature(idx, feature):
        # do stuff in the new process
        pass
    ...

def multiprocess_feature(process_args):
    b_instance = B()
    return b_instance.process_feature(*process_args)

class A:
    ...
    def process_stuff():
        ...
        with Pool(processes=num_processes, maxtasksperchild=10) as pool:
            results = pool.starmap(
                multiprocess_feature,
                [
                    (idx, feature)
                    for idx, feature in enumerate(features)
                ],
                chunksize=100,
            )
        ...
    ...

...
Answered By: Akaisteph7
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.