Multiprocessing: Instantiate Processes individually

Question:

I have an embarrassingly parallel problem in a Reinforcement-Learning context. I would like to let the neural network generate data in parallel. To achieve that each process needs its own model.

I have tried to use Pool to achieve this, but now I am not sure if this is the correct method.

from multiprocessing import Pool

def run():
    with Pool(processes=8) as p:
        result = p.map_async(f, range(8))
        p.close()
        p.join()
        print(result.get())


def f(x):
    return x*x


if __name__ == '__main__':
    run()

I know that you can use an initializer to set up the processes, but I think this is used to set up the processes with the same fixed data.

model = None

def worker_init():
    global model
    model = CNN()

This does not work. So how can I give every Process its own model?

Asked By: Nima Mousavi

||

Answers:

well, you are creating different objects, they just have the same id because the have the same virtual address, but the proper way to create individual workers that have their own "storage" is by subclassing multiprocessing.Process instead of using global variables.

a pool is more about doing heterogenous memory-less work to a certain degree, or limiting the amount of work that is submitted at one time.

from multiprocessing import Process, Queue
import random
class CNN:
    def __init__(self):
        self.value = random.randint(0, 100)
    def __repr__(self):
        return str(self.value)
class Worker(Process):
    def __init__(self, identification, return_queue: Queue):
        super().__init__(daemon=True)
        self.id = identification
        self.model = None
        self.return_queue = return_queue

    def run(self) -> None:
        self.model = CNN()
        self.return_queue.put((self.id, self.model))

def run():
    return_queue = Queue()
    workers = []
    for i in range(8):
        worker = Worker(i, return_queue)
        worker.start()
        workers.append(worker)
    for worker in workers:
        worker.join()
    while not return_queue.empty():
        res = return_queue.get()
        print("id =", res[0], ", content =", res[1])

if __name__ == '__main__':
    run()
id = 0 , content = 72
id = 2 , content = 0
id = 1 , content = 95
id = 4 , content = 51
id = 5 , content = 83
id = 6 , content = 91
id = 3 , content = 7
id = 7 , content = 78

you don’t really need to join them all before processing results if you know how much items you are expecting in the queue, you can poll the queue for exactly that number of returns and skip the joining part, you can also spin an asyncio loop to both wait for process end and poll the queue at the same time, the posted code is only the safest one in case the process crashed, without having to run an asyncio eventloop.

Answered By: Ahmed AEK
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.