how to convert nested loops with body to parallelized iterables for multiprocessing
Question:
i have the below two nested loops. i want to use them as iterables passed to .map operator to parallelize their execution.i am familiar with the following notation:
with PoolExec(max_workers=int(config['MULTIPROCESSING']['proceses_count']),initializer=self.initPool,initargs=(arg0,arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,)) as GridCells10mX10mIteratorPool.__poolExec:
self.__chunkSize = PoolUtils.getChunkSizeForLenOfIterables(lenOfIterablesList=self.__maxNumOfCellsVertically*self.__maxNumOfCellsHorizontally,cpuCount=int(config['MULTIPROCESSING']['cpu_count']))
for res in GridCells10mX10mIteratorPool.__poolExec.map(self.run,[(i,j) for i in range(0,1800,10) for j in range(0,2000,10)] ,chunksize=self.__chunkSize):
but as shown in code below, there are two lines of code after the outer loop and another two lines of code after the inner one.how can i convert these two loop to the above mentioned notation
code:
for x in range(row,row + gVerticalStep):
if rowsCnt == gVerticalStep:
rowsCnt = 0
for y in range(col,col + gHorizontalStep):
if colsCnt == gHorizontalStep:
colsCnt = 0
Answers:
In general, if you want to convert a double (or triple) for
loop into a single iterable
, you can use the itertools
library.
Specifically, itertools.product()
(documentation link), will stick two iterables into a single one.
For example, the following code snippet:
for x in range(1, 5):
for y in range(1, 5):
print((x,y))
will do the same thing as this
import itertools
iterable = itertools.product(range(1,5), range(1,5))
for x,y in iterable:
print((x,y))
The notable difference though, is that now you have both for loops as a single iterable which can then be passed into whatever multiprocessing
function you are trying to use.
A simple means to turn nested loops into an iterable is to create a generator function. To use your code as an example it could look something like this.
def param_iterator(row, col, gVerticalStep):
rowsCnt = colsCnt = 0
for x in range(row,row + gVerticalStep):
if rowsCnt == gVerticalStep:
rowsCnt = 0
for y in range(col,col + gHorizontalStep):
if colsCnt == gHorizontalStep:
colsCnt = 0
yield (x, y, rowsCnt, colsCnt)
with PoolExec(...) as poolExec:
params = param_iterator()
poolExec.map(self.run, params):
i have the below two nested loops. i want to use them as iterables passed to .map operator to parallelize their execution.i am familiar with the following notation:
with PoolExec(max_workers=int(config['MULTIPROCESSING']['proceses_count']),initializer=self.initPool,initargs=(arg0,arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,)) as GridCells10mX10mIteratorPool.__poolExec:
self.__chunkSize = PoolUtils.getChunkSizeForLenOfIterables(lenOfIterablesList=self.__maxNumOfCellsVertically*self.__maxNumOfCellsHorizontally,cpuCount=int(config['MULTIPROCESSING']['cpu_count']))
for res in GridCells10mX10mIteratorPool.__poolExec.map(self.run,[(i,j) for i in range(0,1800,10) for j in range(0,2000,10)] ,chunksize=self.__chunkSize):
but as shown in code below, there are two lines of code after the outer loop and another two lines of code after the inner one.how can i convert these two loop to the above mentioned notation
code:
for x in range(row,row + gVerticalStep):
if rowsCnt == gVerticalStep:
rowsCnt = 0
for y in range(col,col + gHorizontalStep):
if colsCnt == gHorizontalStep:
colsCnt = 0
In general, if you want to convert a double (or triple) for
loop into a single iterable
, you can use the itertools
library.
Specifically, itertools.product()
(documentation link), will stick two iterables into a single one.
For example, the following code snippet:
for x in range(1, 5):
for y in range(1, 5):
print((x,y))
will do the same thing as this
import itertools
iterable = itertools.product(range(1,5), range(1,5))
for x,y in iterable:
print((x,y))
The notable difference though, is that now you have both for loops as a single iterable which can then be passed into whatever multiprocessing
function you are trying to use.
A simple means to turn nested loops into an iterable is to create a generator function. To use your code as an example it could look something like this.
def param_iterator(row, col, gVerticalStep):
rowsCnt = colsCnt = 0
for x in range(row,row + gVerticalStep):
if rowsCnt == gVerticalStep:
rowsCnt = 0
for y in range(col,col + gHorizontalStep):
if colsCnt == gHorizontalStep:
colsCnt = 0
yield (x, y, rowsCnt, colsCnt)
with PoolExec(...) as poolExec:
params = param_iterator()
poolExec.map(self.run, params):