How to create lists by running functions in parallel in python

Question:

I want to create two lists by running two functions (returning a value each for every run) in parallel. My code below works, but is still taking too much time. Is there a more efficient way to parallelize this code?

import time
from joblib import Parallel, delayed
 
catchments = 50   #define number of catchments to plot here
randomlist = random.sample(range(2, 2100), catchments)

def budx(i):   #Time-taking task....
    try:        
        catch = slicecatch(i)
        return catch.PETNatVeg.mean().values/catch.Prec.mean().values
    except IndexError as e:
        pass    
        
def budy(i):   #Time-taking task....
    try:        
        catch = slicecatch(i)
        return catch.TotalET.mean().values/catch.Prec.mean().values
    except IndexError as e:
        pass 
    
        
 
start_time = time.perf_counter()

bud_x = Parallel(n_jobs=-1)(delayed(budx)(i) for i in randomlist)
bud_y = Parallel(n_jobs=-1)(delayed(budy)(i) for i in randomlist) 

finish_time = time.perf_counter()
Asked By: Zeeshan Asghar

||

Answers:

The way you’ve written your code, you’re first running all your budx instances, waiting for them to complete, and only then running your budy instances. That is, you are sequentially running two sets of parallel tasks.

Here’s one possible way of doing that, noting that (a) I was not previously familiar with joblib, so there may be a more canonical form, and (b) I’ve replaced your budx and budy implementations with code that I can actually run:

import time
from joblib import Parallel, delayed
import random
 
catchments = 50   #define number of catchments to plot here
randomlist = random.sample(range(2, 2100), catchments)

def budx(i):   #Time-taking task....
    print("start budx", i)
    time.sleep(random.randint(0, 10))
    print("end budx", i)
    return ("budx", i)

def budy(i):   #Time-taking task....
    print("start budy", i)
    time.sleep(random.randint(0, 10))
    print("end budy", i)
    return ("budy", i)
 
start_time = time.perf_counter()

results = Parallel(n_jobs=-1)(
        [delayed(budx)(i) for i in range(5)] +
        [delayed(budy)(i) for i in range(5)])

finish_time = time.perf_counter()

print("total time:", finish_time - start_time)
print("results", results)

If I were writing this I would probably opt for native Python tools like concurrent.futures rather than a third-party module like joblib (unless there are additional features provided by joblib that make your life easier).

Answered By: larsks
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.