Naive and easiest way to decompose independent loop into parallel threads/processes

Question

I have a loop of intensive calculations, I want them to be
accelerated using the multicore processor as they are independent:
all performed in parallel. What the easiest way to do that in
python?
Let’s imagine that those calculations have to be summed at the end. How to easily add them to a list or a float variable?

Thanks for all your pedagogic answers and using python libraries ;o)

Asked By: sol

||

Answer 1

Multicore processing is a bit difficult to do in CPython (thanks to the GIL ). However, their is the multiprocessing module which allows to use subprocesses (not threads) to split you work on multiple cores.

The module is relatively straight forward to use as long as your code can really be split into multiple parts and doesn’t depend on shared objects. The linked documentation should be a good starting point.

Answered By: Martin Thurau

Answer 2

From my experience, multi-threading is probably not going to be a viable option for speeding things up (due to the Global Interpreter Lock).

A good alternative is the multiprocessing module. This may or may not work well, depending on how much data you end up having to pass around from one process to another.

Another good alternative would be to consider using numpy for your computations (if you aren’t already). If you can vectorize your code, you should be able to achieve significant speedups even on a single core. Depending on what exactly you’re doing and on your build of numpy, it might even be able to transparently distribute the computations across multiple cores.

edit Here is a complete example of using the multiprocessing module to perform a simple computation. It uses four processes to compute the squares of the numbers from zero to nine.

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    inputs = range(10)
    result = pool.map(f, inputs)
    print result

This is meant as a simple illustration. Given the trivial nature of f(), this parallel version will almost certainly be slower than computing the same thing serially.

Answered By: NPE

Naive and easiest way to decompose independent loop into parallel threads/processes

Question:

Answers: