How can I append to class variables using multiprocessing in python?

Question:

I have this program where everything is built in a class object. There is a function that does 50 computations of a another function, each with a different input, so I decided to use multiprocessing to speed it up. However, the list that needs to be returned in the end always returns empty. any ideas? Here is a simplified version of my problem. The output of main_function() should be a list containing the numbers 0-9, however the list returns empty.

class MyClass(object):
    def __init__(self):
        self.arr = list()

    def helper_function(self, n):
        self.arr.append(n)

    def main_function(self):
        jobs = []

        for i in range(0,10):
            p = multiprocessing.Process(target=self.helper_function, args=(i,))
            jobs.append(p)
            p.start()

        for job in jobs:
            jobs.join()

        print(self.arr)

Asked By: Randy Maldonado

||

Answers:

arr is a list that’s not going to be shared across subprocess instances.

For that you have to use a Manager object to create a managed list that is aware of the fact that it’s shared between processes.

The key is:

self.arr = multiprocessing.Manager().list()

full working example:

import multiprocessing

class MyClass(object):
    def __init__(self):
        self.arr = multiprocessing.Manager().list()

    def helper_function(self, n):
        self.arr.append(n)

    def main_function(self):
        jobs = []

        for i in range(0,10):
            p = multiprocessing.Process(target=self.helper_function, args=(i,))
            jobs.append(p)
            p.start()

        for job in jobs:
            job.join()

        print(self.arr)

if __name__ == "__main__":
    a = MyClass()
    a.main_function()

this code now prints: [7, 9, 2, 8, 6, 0, 4, 3, 1, 5]

(well of course the order cannot be relied on between several executions, but all numbers are here which means that all processes contributed to the result)

multiprocessing is touchy.

For simple multiprocessing tasks, I would recomend:

from multiprocessing.dummy import Pool as ThreadPool


class MyClass(object):
    def __init__(self):
        self.arr = list()

    def helper_function(self, n):
        self.arr.append(n)

    def main_function(self):
        pool = ThreadPool(4)
        pool.map(self.helper_function, range(10))
        print(self.arr)


if __name__ == '__main__':
    c = MyClass()
    c.main_function()

The idea of using map instead of complicated multithreading calls is from one of my favorite blog posts: https://chriskiehl.com/article/parallelism-in-one-line

Answered By: James Gabriel