Naive and easiest way to decompose independent loop into parallel threads/processes
Question:
- I have a loop of intensive calculations, I want them to be
accelerated using the multicore processor as they are independent:
all performed in parallel. What the easiest way to do that in
python?
- Let’s imagine that those calculations have to be summed at the end. How to easily add them to a list or a float variable?
Thanks for all your pedagogic answers and using python libraries ;o)
Answers:
Multicore processing is a bit difficult to do in CPython (thanks to the GIL ). However, their is the multiprocessing module which allows to use subprocesses (not threads) to split you work on multiple cores.
The module is relatively straight forward to use as long as your code can really be split into multiple parts and doesn’t depend on shared objects. The linked documentation should be a good starting point.
From my experience, multi-threading is probably not going to be a viable option for speeding things up (due to the Global Interpreter Lock).
A good alternative is the multiprocessing
module. This may or may not work well, depending on how much data you end up having to pass around from one process to another.
Another good alternative would be to consider using numpy
for your computations (if you aren’t already). If you can vectorize your code, you should be able to achieve significant speedups even on a single core. Depending on what exactly you’re doing and on your build of numpy
, it might even be able to transparently distribute the computations across multiple cores.
edit Here is a complete example of using the multiprocessing
module to perform a simple computation. It uses four processes to compute the squares of the numbers from zero to nine.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
inputs = range(10)
result = pool.map(f, inputs)
print result
This is meant as a simple illustration. Given the trivial nature of f()
, this parallel version will almost certainly be slower than computing the same thing serially.
- I have a loop of intensive calculations, I want them to be
accelerated using the multicore processor as they are independent:
all performed in parallel. What the easiest way to do that in
python? - Let’s imagine that those calculations have to be summed at the end. How to easily add them to a list or a float variable?
Thanks for all your pedagogic answers and using python libraries ;o)
Multicore processing is a bit difficult to do in CPython (thanks to the GIL ). However, their is the multiprocessing module which allows to use subprocesses (not threads) to split you work on multiple cores.
The module is relatively straight forward to use as long as your code can really be split into multiple parts and doesn’t depend on shared objects. The linked documentation should be a good starting point.
From my experience, multi-threading is probably not going to be a viable option for speeding things up (due to the Global Interpreter Lock).
A good alternative is the multiprocessing
module. This may or may not work well, depending on how much data you end up having to pass around from one process to another.
Another good alternative would be to consider using numpy
for your computations (if you aren’t already). If you can vectorize your code, you should be able to achieve significant speedups even on a single core. Depending on what exactly you’re doing and on your build of numpy
, it might even be able to transparently distribute the computations across multiple cores.
edit Here is a complete example of using the multiprocessing
module to perform a simple computation. It uses four processes to compute the squares of the numbers from zero to nine.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
inputs = range(10)
result = pool.map(f, inputs)
print result
This is meant as a simple illustration. Given the trivial nature of f()
, this parallel version will almost certainly be slower than computing the same thing serially.