Multiprocessing: Instantiate Processes individually
Question:
I have an embarrassingly parallel problem in a Reinforcement-Learning context. I would like to let the neural network generate data in parallel. To achieve that each process needs its own model.
I have tried to use Pool to achieve this, but now I am not sure if this is the correct method.
from multiprocessing import Pool
def run():
with Pool(processes=8) as p:
result = p.map_async(f, range(8))
p.close()
p.join()
print(result.get())
def f(x):
return x*x
if __name__ == '__main__':
run()
I know that you can use an initializer to set up the processes, but I think this is used to set up the processes with the same fixed data.
model = None
def worker_init():
global model
model = CNN()
This does not work. So how can I give every Process its own model?
Answers:
well, you are creating different objects, they just have the same id because the have the same virtual address, but the proper way to create individual workers that have their own "storage" is by subclassing multiprocessing.Process instead of using global variables.
a pool is more about doing heterogenous memory-less work to a certain degree, or limiting the amount of work that is submitted at one time.
from multiprocessing import Process, Queue
import random
class CNN:
def __init__(self):
self.value = random.randint(0, 100)
def __repr__(self):
return str(self.value)
class Worker(Process):
def __init__(self, identification, return_queue: Queue):
super().__init__(daemon=True)
self.id = identification
self.model = None
self.return_queue = return_queue
def run(self) -> None:
self.model = CNN()
self.return_queue.put((self.id, self.model))
def run():
return_queue = Queue()
workers = []
for i in range(8):
worker = Worker(i, return_queue)
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
while not return_queue.empty():
res = return_queue.get()
print("id =", res[0], ", content =", res[1])
if __name__ == '__main__':
run()
id = 0 , content = 72
id = 2 , content = 0
id = 1 , content = 95
id = 4 , content = 51
id = 5 , content = 83
id = 6 , content = 91
id = 3 , content = 7
id = 7 , content = 78
you don’t really need to join them all before processing results if you know how much items you are expecting in the queue, you can poll the queue for exactly that number of returns and skip the joining part, you can also spin an asyncio loop to both wait for process end and poll the queue at the same time, the posted code is only the safest one in case the process crashed, without having to run an asyncio eventloop.
I have an embarrassingly parallel problem in a Reinforcement-Learning context. I would like to let the neural network generate data in parallel. To achieve that each process needs its own model.
I have tried to use Pool to achieve this, but now I am not sure if this is the correct method.
from multiprocessing import Pool
def run():
with Pool(processes=8) as p:
result = p.map_async(f, range(8))
p.close()
p.join()
print(result.get())
def f(x):
return x*x
if __name__ == '__main__':
run()
I know that you can use an initializer to set up the processes, but I think this is used to set up the processes with the same fixed data.
model = None
def worker_init():
global model
model = CNN()
This does not work. So how can I give every Process its own model?
well, you are creating different objects, they just have the same id because the have the same virtual address, but the proper way to create individual workers that have their own "storage" is by subclassing multiprocessing.Process instead of using global variables.
a pool is more about doing heterogenous memory-less work to a certain degree, or limiting the amount of work that is submitted at one time.
from multiprocessing import Process, Queue
import random
class CNN:
def __init__(self):
self.value = random.randint(0, 100)
def __repr__(self):
return str(self.value)
class Worker(Process):
def __init__(self, identification, return_queue: Queue):
super().__init__(daemon=True)
self.id = identification
self.model = None
self.return_queue = return_queue
def run(self) -> None:
self.model = CNN()
self.return_queue.put((self.id, self.model))
def run():
return_queue = Queue()
workers = []
for i in range(8):
worker = Worker(i, return_queue)
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
while not return_queue.empty():
res = return_queue.get()
print("id =", res[0], ", content =", res[1])
if __name__ == '__main__':
run()
id = 0 , content = 72
id = 2 , content = 0
id = 1 , content = 95
id = 4 , content = 51
id = 5 , content = 83
id = 6 , content = 91
id = 3 , content = 7
id = 7 , content = 78
you don’t really need to join them all before processing results if you know how much items you are expecting in the queue, you can poll the queue for exactly that number of returns and skip the joining part, you can also spin an asyncio loop to both wait for process end and poll the queue at the same time, the posted code is only the safest one in case the process crashed, without having to run an asyncio eventloop.