Why parallelized code with concurrent.futures is slower then regular code?

Question:

I tried parallelizing with concurrent.futures expecting that parallelized code will be faster.

I made a dafault code to test the parallelization. It is not important what the code does. I’m mainly interested in the speed of the dafault code and parallelized code. All it does is calculate the correlation between lists from sigs and data_mat and store the values in corr_coefs. You can see the plain code below:

from time import time
import numpy as np

sigs = [
    [91, 43, 44, 49, 64, 37, 61, 31, 73],
    [59, 94, 91, 12, 47, 44, 93, 7, 84],
    [47, 76, 24, 87, 2, 83, 77, 60, 36],
    [83, 68, 3, 49, 14, 12, 51, 36, 22]
]

data_mat = [
    [83, 68, 3, 49, 14, 12, 51, 36, 22],
    [8, 78, 44, 40, 39, 67, 63, 64, 34],
    [49, 24, 77, 91, 66, 44, 83, 30, 99],
    [97, 40, 69, 7, 24, 70, 63, 52, 81],
    [26, 62, 53, 36, 72, 54, 85, 94, 31],
    [99, 52, 87, 52, 50, 9, 22, 72, 62],
    [91, 15, 54, 84, 89, 15, 43, 31, 9],
    [39, 26, 36, 81, 65, 50, 67, 12, 19],
    [67, 22, 86, 24, 38, 30, 45, 94, 44],
    # etc.
]

execution_time_start = time()

corr_coefs = []
for sig in sigs:
    
    for data_mat_row in data_mat:
        corr = np.corrcoef(np.square(sig), np.square(data_mat_row))
        corr_coefs.append(corr[0, 1])
        
execution_time_end = time()
elapsed_time = execution_time_end - execution_time_start
print(f'Execution time (without parallelizaion):  = {elapsed_time:.20f} s')

I tried to parallelize this code using concurrent.futures. The data_mat and sings sheets are the same (I just rewrote the code):


from time import time
import numpy as np
import concurrent.futures

sigs = [
    [91, 43, 44, 49, 64, 37, 61, 31, 73],
    [59, 94, 91, 12, 47, 44, 93, 7, 84],
    [47, 76, 24, 87, 2, 83, 77, 60, 36],
    [83, 68, 3, 49, 14, 12, 51, 36, 22]
]

data_mat = [
    [83, 68, 3, 49, 14, 12, 51, 36, 22],
    [8, 78, 44, 40, 39, 67, 63, 64, 34],
    [49, 24, 77, 91, 66, 44, 83, 30, 99],
    [97, 40, 69, 7, 24, 70, 63, 52, 81],
    [26, 62, 53, 36, 72, 54, 85, 94, 31],
    [99, 52, 87, 52, 50, 9, 22, 72, 62],
    [91, 15, 54, 84, 89, 15, 43, 31, 9],
    [39, 26, 36, 81, 65, 50, 67, 12, 19],
    [67, 22, 86, 24, 38, 30, 45, 94, 44],
    # etc.
]

execution_time_start = time()


corr_coefs = []
with concurrent.futures.ThreadPoolExecutor() as executor:
    future_corr_coefs = {executor.submit(np.corrcoef, np.square(sig), np.square(data_mat_row)): (sig, data_mat_row)
                         for sig in sigs for data_mat_row in data_mat}
    for future in concurrent.futures.as_completed(future_corr_coefs):
        sig, data_mat_row = future_corr_coefs[future]

        corr = future.result()
        corr_coefs.append(corr[0,1])


        
execution_time_end = time()
elapsed_time = execution_time_end - execution_time_start
print(f'Execution time (with parallelizaion):  = {elapsed_time:.20f} s')

I expected the rewritten code to be faster, but I got these outputs:

Execution time (without parallelization):  = 1.30910301208496093750 s
Execution time (with parallelization):  = 2.38465380668640136719 s

I also tried with a larger data set by expanding the list data_mat , but still the code is slower. Does anyone have any advice that would help? I still thought it might be Overhead. But I am not able to explain how…

Asked By: AxieKendy

||

Answers:

I found the answer. The code is faster, but sigs and data_mat should be much larger (very large) to be more efficient. If the input data set is small, then it is pointless to use concurent.futures because the overhead for parallelizing the code increases the computation time…but if the data set is large and the code in loops is more complex, then the parallelization is faster…

Answered By: AxieKendy