Multi-Processing nested loops in Python?

Question:

I have a multi-nested for loop and I’d like to parallelize this as much as possible, in Python.

Suppose I have some arbitrary function, which accepts two arguments func(a,b) and I’d like to compute this function on all combinations of M and N.

What I’ve done so far is ‘flatten’ the indices as a dictionary

idx_map = {} 
count = 0
for i in range(n):
    for j in range(m):
        idx_map[count] = (i,j)
        count += 1 

Now that my nested loop is flattened, I can use it like so:

arr = []
for idx in range(n*m):
    i,j = idx_map[idx]
    arr.append( func(M[i], N[j]) )  

Can I use this with Python’s built in multi-Processing to parallelize? Race conditions should not be an issue because I do not need to aggregate func calls; rather, I just want to arrive at some final array, which evaluates all func(a,b) combinations across M and N. (So Async behavior and complexity should not be relevant here.)

What’s the best way to accomplish this effect?

I see from this related question but I don’t understand what the author was trying to illustrate.

if 1:   # multi-threaded
    pool = mp.Pool(28) # try 2X num procs and inc/dec until cpu maxed
    st = time.time()
    for x in pool.imap_unordered(worker, range(data_Y)):
        pass
    print 'Multiprocess total time is %4.3f seconds' % (time.time()-st)
    print
Asked By: jbuddy_13

||

Answers:

You can accomplish this yes, however the amount of work you are doing per function call needs to be quite substantial to overcome the overhead of the processes.

Vectorizing using something like numpy is typically easier, like Jérôme stated previously.

I have altered your code so that you may observe the speed up you get by using multiprocessing.

Feel free to change the largNum variable to see how as the amount of work increases per function call the scaling for multiprocessing gets better and how at low values multiprocessing is actually slower.

from concurrent.futures import ProcessPoolExecutor
import time

# Sums n**2 of a+b
def costlyFunc(theArgs):
    a=theArgs[0]
    b=theArgs[1]
    topOfRange=(a+b)**2

    sum=0
    for i in range(topOfRange):
        sum+=i

    return sum
            
#changed to list
idx_map = []
largNum=200

# Your indicey flattening
for i in range(largNum):
    for j in range(largNum):
        idx_map.append((i,j))

I use the map function in the single core version to call costlyFunc on every element in the list. Python’s concurrent.futures module also has a similar map function, however it distributes it over multiple processes.

if __name__ == "__main__":
    # No multiprocessing
    oneCoreTimer=time.time()
    result=[x for x in map(costlyFunc,idx_map)]
    oneCoreTime=time.time()-oneCoreTimer
    print(oneCoreTime," seconds to complete the function without multiprocessing")

    # Multiprocessing
    mpTimer=time.time()
    with ProcessPoolExecutor() as ex:
        mpResult=[x for x in ex.map(costlyFunc,idx_map)]
    mpTime=time.time()-mpTimer
    print(mpTime," seconds to complete the function with multiprocessing")

    print(f"Multiprocessing is {oneCoreTime/mpTime} times faster than using one core")
Answered By: TaylorOxelgren