How to maintain global processes in a pool working recursively?

Question:

I want to implement a recursive parallel algorithm and I want a pool to be created only once and each time step do a job wait for all the jobs to finish and then call the processes again with inputs the previous outputs and then again the same at the next time step, etc.

My problem is that I have implemented a version where every time step I create and kill the pool, but this is extremely slow, even slower than the sequential version. When I try to implement a version where the pool is created only once at the beginning I got assertion error when I try to call join().

This is my code

def log_result(result):

    tempx , tempb, u = result

    X[:,u,np.newaxis], b[:,u,np.newaxis] = tempx , tempb


workers =  mp.Pool(processes = 4) 
for t in range(p,T):

    count = 0 #==========This is only master's job=============
    for l in range(p):
        for k in range(4):
            gn[count]=train[t-l-1,k]
            count+=1
    G = G*v +  gn @ gn.T#==================================

    if __name__ == '__main__':
        for i in range(4):
            workers.apply_async(OULtraining, args=(train[t,i], X[:,i,np.newaxis], b[:,i,np.newaxis], i, gn), callback = log_result)


        workers.join()   

X and b are the matrices that I want to update directly at the master’s memory.

What is wrong here and I get the assertion error?

Can I implement with the pool what I want or not?

Asked By: Bekromoularo

||

Answers:

You cannot join a pool that is not closed first, as join() will wait worker processes to terminate, not jobs to complete (https://docs.python.org/3.6/library/multiprocessing.html section 17.2.2.9).

But as this will close the pool, which is not what you want, you cannot use this. So join is out, and you need to implement a “wait until all jobs completed” by yourself.

One way of doing this without busy loops would be using a queue. You could also work with bounded semaphores, but they do not work on all operating systems.

counter = 0
lock_queue = multiprocessing.Queue()
counter_lock = multiprocessing.Lock()

def log_result(result):

    tempx , tempb, u = result

    X[:,u,np.newaxis], b[:,u,np.newaxis] = tempx , tempb
    with counter_lock:
        counter += 1
        if counter == 4:
            counter = 0
            lock_queue.put(42)



workers =  mp.Pool(processes = 4) 
for t in range(p,T):

    count = 0 #==========This is only master's job=============
    for l in range(p):
        for k in range(4):
            gn[count]=train[t-l-1,k]
            count+=1
    G = G*v +  gn @ gn.T#==================================

    if __name__ == '__main__':
        counter = 0
        for i in range(4):
            workers.apply_async(OULtraining, args=(train[t,i], X[:,i,np.newaxis], b[:,i,np.newaxis], i, gn), callback = log_result)


        lock_queue.get(block=True)

This resets a global counter before submitting jobs. As soon as a job is completed, you callback increments a global counter. When the counter hits 4 (your number of jobs), the callback knows it has processed the last result. Then a dummy message is sent in a queue. Your main program is waiting at Queue.get() for something to appear there.

This allows your main program to block until all jobs have completed, without closing down the pool.

If you replace multiprocessing.Pool with ProcessPoolExecutor from concurrent.futures, you can skip this part and use

concurrent.futures.wait(fs, timeout=None, return_when=ALL_COMPLETED)

to block until all submitted tasks have finished. From functional standpoint there is no difference between these. The concurrent.futures method is a couple of lines shorter but the result is exactly the same.

Answered By: Hannu