How can I append to class variables using multiprocessing in python?
Question:
I have this program where everything is built in a class object. There is a function that does 50 computations of a another function, each with a different input, so I decided to use multiprocessing to speed it up. However, the list that needs to be returned in the end always returns empty. any ideas? Here is a simplified version of my problem. The output of main_function() should be a list containing the numbers 0-9, however the list returns empty.
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
jobs.join()
print(self.arr)
Answers:
arr
is a list
that’s not going to be shared across subprocess instances.
For that you have to use a Manager
object to create a managed list that is aware of the fact that it’s shared between processes.
The key is:
self.arr = multiprocessing.Manager().list()
full working example:
import multiprocessing
class MyClass(object):
def __init__(self):
self.arr = multiprocessing.Manager().list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
job.join()
print(self.arr)
if __name__ == "__main__":
a = MyClass()
a.main_function()
this code now prints: [7, 9, 2, 8, 6, 0, 4, 3, 1, 5]
(well of course the order cannot be relied on between several executions, but all numbers are here which means that all processes contributed to the result)
multiprocessing is touchy.
For simple multiprocessing tasks, I would recomend:
from multiprocessing.dummy import Pool as ThreadPool
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
pool = ThreadPool(4)
pool.map(self.helper_function, range(10))
print(self.arr)
if __name__ == '__main__':
c = MyClass()
c.main_function()
The idea of using map instead of complicated multithreading calls is from one of my favorite blog posts: https://chriskiehl.com/article/parallelism-in-one-line
I have this program where everything is built in a class object. There is a function that does 50 computations of a another function, each with a different input, so I decided to use multiprocessing to speed it up. However, the list that needs to be returned in the end always returns empty. any ideas? Here is a simplified version of my problem. The output of main_function() should be a list containing the numbers 0-9, however the list returns empty.
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
jobs.join()
print(self.arr)
arr
is a list
that’s not going to be shared across subprocess instances.
For that you have to use a Manager
object to create a managed list that is aware of the fact that it’s shared between processes.
The key is:
self.arr = multiprocessing.Manager().list()
full working example:
import multiprocessing
class MyClass(object):
def __init__(self):
self.arr = multiprocessing.Manager().list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
job.join()
print(self.arr)
if __name__ == "__main__":
a = MyClass()
a.main_function()
this code now prints: [7, 9, 2, 8, 6, 0, 4, 3, 1, 5]
(well of course the order cannot be relied on between several executions, but all numbers are here which means that all processes contributed to the result)
multiprocessing is touchy.
For simple multiprocessing tasks, I would recomend:
from multiprocessing.dummy import Pool as ThreadPool
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
pool = ThreadPool(4)
pool.map(self.helper_function, range(10))
print(self.arr)
if __name__ == '__main__':
c = MyClass()
c.main_function()
The idea of using map instead of complicated multithreading calls is from one of my favorite blog posts: https://chriskiehl.com/article/parallelism-in-one-line