python pool doesnt run function by each item in a list
Question:
I tried to use pool for multiprocessing purposes. check my code:
def func1(x):
print x
if __name__ == "__main__":
myList = ["111","222","333","444"]
p = Pool(processes=4)
res1 = p.map(func1,myList)
what I think the export must be:
222
111
333
444
what it gives:
3332
122411
44
what am I doing wrong here?
Answers:
As Kounis said in the comment, Pool
creates a set of processes for async tasks processing. Also, doc of Pool.map
explicitely says that processing would be parallel.
In practice, since you are creating a Pool
with 4 worker processes, when you post the map
of 4 elements, each one of them gets processed immediately by one worker. Therefore, the print x
s get executed simultaneously.
There is nothing wrong with your code; it does exactly what should be expected of it.
What probably confuses you if your belief that no matter what executes firsts, it should block all others until it is finished. This is thankfully not true because if it were, it would not be parallel programming. You are launching 4 parallel processes that all write to the standard output (stdout
) with no regard for precedence and thus you create some kind of race condition that is arbitrarily resolved.
Try writing to 4 separate files instead of stdout (see code below) and you will see the results of parallel code more clearly; the files are generated simultaneously instead of in series.
def func1(x):
with open('file_{}'.format(x), w) as f:
f.write(x)
if __name__ == "__main__":
myList = ["111","222","333","444"]
p = Pool(processes=4)
res1 = p.map(func1,myList)
I tried to use pool for multiprocessing purposes. check my code:
def func1(x):
print x
if __name__ == "__main__":
myList = ["111","222","333","444"]
p = Pool(processes=4)
res1 = p.map(func1,myList)
what I think the export must be:
222
111
333
444
what it gives:
3332
122411
44
what am I doing wrong here?
As Kounis said in the comment, Pool
creates a set of processes for async tasks processing. Also, doc of Pool.map
explicitely says that processing would be parallel.
In practice, since you are creating a Pool
with 4 worker processes, when you post the map
of 4 elements, each one of them gets processed immediately by one worker. Therefore, the print x
s get executed simultaneously.
There is nothing wrong with your code; it does exactly what should be expected of it.
What probably confuses you if your belief that no matter what executes firsts, it should block all others until it is finished. This is thankfully not true because if it were, it would not be parallel programming. You are launching 4 parallel processes that all write to the standard output (stdout
) with no regard for precedence and thus you create some kind of race condition that is arbitrarily resolved.
Try writing to 4 separate files instead of stdout (see code below) and you will see the results of parallel code more clearly; the files are generated simultaneously instead of in series.
def func1(x):
with open('file_{}'.format(x), w) as f:
f.write(x)
if __name__ == "__main__":
myList = ["111","222","333","444"]
p = Pool(processes=4)
res1 = p.map(func1,myList)