In what situation do we need to use `multiprocessing.Pool.imap_unordered`?

Question

The ordering of results from the returned iterator of imap_unordered is arbitrary, and it doesn’t seem to run faster than imap(which I check with the following code), so why would one use this method?

from multiprocessing import Pool
import time

def square(i):
    time.sleep(0.01)
    return i ** 2

p = Pool(4)
nums = range(50)

start = time.time()
print 'Using imap'
for i in p.imap(square, nums):
    pass
print 'Time elapsed: %s' % (time.time() - start)

start = time.time()
print 'Using imap_unordered'
for i in p.imap_unordered(square, nums):
    pass
print 'Time elapsed: %s' % (time.time() - start)

Asked By: satoru

||

Source

Answer 1

Using pool.imap_unordered instead of pool.imap will not have a large effect on the total running time of your code. It might be a little faster, but not by too much.

What it may do, however, is make the interval between values being available in your iteration more even. That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered can smooth things out by yielding faster-calculated values ahead of slower-calculated values. The regular imap will delay yielding the faster ones until after the slower ones ahead of them have been computed (but this does not delay the worker processes moving on to more calculations, just the time for you to see them).

Try making your work function sleep for i*0.1 seconds, shuffling your input list and printing i in your loops. You’ll be able to see the difference between the two imap versions. Here’s my version (the main function and the if __name__ == '__main__' boilerplate was is required to run correctly on Windows):

from multiprocessing import Pool
import time
import random

def work(i):
    time.sleep(0.1*i)
    return i

def main():
    p = Pool(4)
    nums = range(50)
    random.shuffle(nums)

    start = time.time()
    print 'Using imap'
    for i in p.imap(work, nums):
        print i
    print 'Time elapsed: %s' % (time.time() - start)

    start = time.time()
    print 'Using imap_unordered'
    for i in p.imap_unordered(work, nums):
        print i
    print 'Time elapsed: %s' % (time.time() - start)

if __name__ == "__main__":
    main()

The imap version will have long pauses while values like 49 are being handled (taking 4.9 seconds), then it will fly over a bunch of other values (which were calculated by the other processes while we were waiting for 49 to be processed). In contrast, the imap_unordered loop will usually not pause nearly as long at one time. It will have more frequent, but shorter pauses, and its output will tend to be smoother.

Answered By: Blckknght

Answer 2

imap_unordered also seems to use less memory over time than imap. At least that’s what I experienced with a iterator over millions of things.

Answered By: Ed Summers

In what situation do we need to use `multiprocessing.Pool.imap_unordered`?

Question:

Answers: