Which multiprocessing method map or apply_async?
Question:
I have a function:
def movingWinStretch(u0,u1):
# u0,u1 are 1D arrays
# do a bunch of stuff to u0 and u1
return C , epsArray, tSamp
When I do this workflow on smaller amounts of data I use a couple of nested for loops to loop through the data matrices and get the inputs u0,u1
. I then append the output C, epsArray, and tSamp to lists after each call to movingWinStretch
. That would look something like :
cArray =[]
dtArray = []
tArray = []
for i in range(1,seisdata.shape[0]):
for j in range(seisdata.shape[3]):
u0 = seisdata[0,-1,:,j]
u1 = seisdata[i,-1,:,j]
C, dtot, tSamp = movingWinStretch(u0, u1)
cArray.append(C)
dtArray.append(dtot)
tArray.append(tSamp)
Now I need to do this on a much larger amount of data and would like to get speed up from the mp package if possible. I’ve written an iterator:
def traceIterator(seisdatarray):
for i in range(1,seisdatarray.shape[0]):
for j in range(seisdatarray.shape[3]):
u0 = seisdatarray[0,-1,:,j]
u1 = seisdatarray[i,-1,:,j]
yield u0, u1
that yields the input to my function.
I’ve used the multiprocessing
package once or twice and thought I would try something like
num_proc = 8
pool = mp.Pool(processes = num_proc)
proc = [pool.apply_async(movingWinStretch,args=(u0, u1)) for u0,u1 in zip(*traceIterator(seisdata))]
results = [p.get() for p in proc]
My issue is how do I do the append step for each call to movingWinStretch
now? Additionally, I don’t think apply_async
is the correct method to use. Perhaps map
or starmap
would be better choices since I have multiple inputs?
Answers:
Yes, you can use map or starmap instead of apply_async. apply_async is used when you want to submit a single function call as a background task and get the results later using get(). On the other hand, map and starmap are used when you want to apply a function to a collection of arguments in parallel, and get the results as a list.
To use map or starmap for your problem, you need to modify your movingWinStretch function to take a tuple of inputs instead of two separate arguments. You can then use the starmap method to apply the function to a list of tuples of inputs, like this:
inputs = [(u0, u1) for u0, u1 in traceIterator(seisdata)]
results = pool.starmap(movingWinStretch, inputs)
This will give you a list of tuples, where each tuple contains the outputs of the movingWinStretch function for a given pair of inputs.
To perform the append step for each call to movingWinStretch, you can modify the movingWinStretch function to return a tuple that includes the index of the input pair, like this:
def movingWinStretch(idx, u0, u1):
# do a bunch of stuff to u0 and u1
C, epsArray, tSamp = ...
return idx, C, epsArray, tSamp
You can then modify the loop that processes the results to append the outputs to the appropriate lists using the index:
cArray = []
dtArray = []
tArray = []
for idx, C, dtot, tSamp in results:
cArray.append((idx, C))
dtArray.append((idx, dtot))
tArray.append((idx, tSamp))
# sort the lists by the index to restore the original order
cArray.sort()
dtArray.sort()
tArray.sort()
# extract the outputs from the sorted lists
cArray = [C for idx, C in cArray]
dtArray = [dtot for idx, dtot in dtArray]
tArray = [tSamp for idx, tSamp in tArray]
Note that sorting the lists by the index is necessary because the order of the outputs in the results list is not guaranteed to be the same as the order of the inputs.
Pool.starmap
reflects the input-order in the output, so there’s no need for artificial indices and sorting. You could also let zip
do the extraction into the 3 output lists, something like the following:
from multiprocessing import Pool
# Your movingWinStretch
def foo(u0, u1):
return u0, u1, u0 + u1
# Your traceIterator
def arguments(n, m):
for u0 in range(n):
for u1 in range(m):
yield u0, u1
if __name__ == "__main__":
num_proc = 8
# a, b, c your cArray, dtArray, tArray
with Pool(num_proc) as pool:
a, b, c = zip(*pool.starmap(foo, arguments(2, 3)))
print(f"{a = }, {b = }, {c = }")
Result here:
a = (0, 0, 0, 1, 1, 1), b = (0, 1, 2, 0, 1, 2), c = (0, 1, 2, 1, 2, 3)
If you need lists, then do
...
a, b, c = map(list, zip(*pool.starmap(foo, arguments(2, 3))))
instead (or use a comprehension if you don’t like map
).
I have a function:
def movingWinStretch(u0,u1):
# u0,u1 are 1D arrays
# do a bunch of stuff to u0 and u1
return C , epsArray, tSamp
When I do this workflow on smaller amounts of data I use a couple of nested for loops to loop through the data matrices and get the inputs u0,u1
. I then append the output C, epsArray, and tSamp to lists after each call to movingWinStretch
. That would look something like :
cArray =[]
dtArray = []
tArray = []
for i in range(1,seisdata.shape[0]):
for j in range(seisdata.shape[3]):
u0 = seisdata[0,-1,:,j]
u1 = seisdata[i,-1,:,j]
C, dtot, tSamp = movingWinStretch(u0, u1)
cArray.append(C)
dtArray.append(dtot)
tArray.append(tSamp)
Now I need to do this on a much larger amount of data and would like to get speed up from the mp package if possible. I’ve written an iterator:
def traceIterator(seisdatarray):
for i in range(1,seisdatarray.shape[0]):
for j in range(seisdatarray.shape[3]):
u0 = seisdatarray[0,-1,:,j]
u1 = seisdatarray[i,-1,:,j]
yield u0, u1
that yields the input to my function.
I’ve used the multiprocessing
package once or twice and thought I would try something like
num_proc = 8
pool = mp.Pool(processes = num_proc)
proc = [pool.apply_async(movingWinStretch,args=(u0, u1)) for u0,u1 in zip(*traceIterator(seisdata))]
results = [p.get() for p in proc]
My issue is how do I do the append step for each call to movingWinStretch
now? Additionally, I don’t think apply_async
is the correct method to use. Perhaps map
or starmap
would be better choices since I have multiple inputs?
Yes, you can use map or starmap instead of apply_async. apply_async is used when you want to submit a single function call as a background task and get the results later using get(). On the other hand, map and starmap are used when you want to apply a function to a collection of arguments in parallel, and get the results as a list.
To use map or starmap for your problem, you need to modify your movingWinStretch function to take a tuple of inputs instead of two separate arguments. You can then use the starmap method to apply the function to a list of tuples of inputs, like this:
inputs = [(u0, u1) for u0, u1 in traceIterator(seisdata)]
results = pool.starmap(movingWinStretch, inputs)
This will give you a list of tuples, where each tuple contains the outputs of the movingWinStretch function for a given pair of inputs.
To perform the append step for each call to movingWinStretch, you can modify the movingWinStretch function to return a tuple that includes the index of the input pair, like this:
def movingWinStretch(idx, u0, u1):
# do a bunch of stuff to u0 and u1
C, epsArray, tSamp = ...
return idx, C, epsArray, tSamp
You can then modify the loop that processes the results to append the outputs to the appropriate lists using the index:
cArray = []
dtArray = []
tArray = []
for idx, C, dtot, tSamp in results:
cArray.append((idx, C))
dtArray.append((idx, dtot))
tArray.append((idx, tSamp))
# sort the lists by the index to restore the original order
cArray.sort()
dtArray.sort()
tArray.sort()
# extract the outputs from the sorted lists
cArray = [C for idx, C in cArray]
dtArray = [dtot for idx, dtot in dtArray]
tArray = [tSamp for idx, tSamp in tArray]
Note that sorting the lists by the index is necessary because the order of the outputs in the results list is not guaranteed to be the same as the order of the inputs.
Pool.starmap
reflects the input-order in the output, so there’s no need for artificial indices and sorting. You could also let zip
do the extraction into the 3 output lists, something like the following:
from multiprocessing import Pool
# Your movingWinStretch
def foo(u0, u1):
return u0, u1, u0 + u1
# Your traceIterator
def arguments(n, m):
for u0 in range(n):
for u1 in range(m):
yield u0, u1
if __name__ == "__main__":
num_proc = 8
# a, b, c your cArray, dtArray, tArray
with Pool(num_proc) as pool:
a, b, c = zip(*pool.starmap(foo, arguments(2, 3)))
print(f"{a = }, {b = }, {c = }")
Result here:
a = (0, 0, 0, 1, 1, 1), b = (0, 1, 2, 0, 1, 2), c = (0, 1, 2, 1, 2, 3)
If you need lists, then do
...
a, b, c = map(list, zip(*pool.starmap(foo, arguments(2, 3))))
instead (or use a comprehension if you don’t like map
).