How to use multiprocessing to calculate several functions simultaneously

Question:

I’m learning how to use multiprocessing and started with simple tasks:

import multiprocessing as mp
import time

starttime = time.time()

Result_1 = []
Result_2 = []
Result_3 = []

def Calculation_1():
    numbers = list(range(0, 10000000))
    for num in numbers:
        Result_1.append(num ** 0.5)

def Calculation_2():
    numbers = list(range(0, 10000000))
    for num in numbers:
        Result_2.append(num ** 2)

def Calculation_3():
    numbers = list(range(0, 10000000))
    for num in numbers:
        Result_3.append(num ** 3)

if __name__ == "__main__":

    p1 = mp.Process(target = Calculation_1)
    p2 = mp.Process(target = Calculation_2)
    p3 = mp.Process(target = Calculation_3)
    
    p1.start()
    p2.start()
    p3.start()
    
    p1.join()
    p2.join()
    p3.join()
    
    endtime = time.time()
    print("Time =", "{:.2f}".format((endtime - starttime) * (10 ** 3)), "ms")

The goal is to calculate all three functions simultaneously instead of sequentially. However, all my result lists are blank. How do I get this right?

Thank you very much.

Asked By: vnc89

||

Answers:

Use the threading.Thread method it will work fine.

from threading import Thread
import time

starttime = time.time()

Result_1 = []
Result_2 = []
Result_3 = []

def Calculation_1():
    numbers = list(range(0, 10000000))
    for num in numbers:
        Result_1.append(num ** 0.5)

def Calculation_2():
    numbers = list(range(0, 10000000))
    for num in numbers:
        Result_2.append(num ** 2)

def Calculation_3():
    numbers = list(range(0, 10000000))
    for num in numbers:
        Result_3.append(num ** 3)

if __name__ == "__main__":

    p1 = Thread(target = Calculation_1)
    p2 = Thread(target = Calculation_2)
    p3 = Thread(target = Calculation_3)
    
    p1.start()
    p2.start()
    p3.start()
    
    p1.join()
    p2.join()
    p3.join()
    
    endtime = time.time()
    print("Time =", "{:.2f}".format((endtime - starttime) * (10 ** 3)), "ms")


Answered By: codester_09

Your program doesn’t work because each Process in Python occupies its own memory space. You have 4 Processes: the main one, and three other ones that you create by calling Process() 3 times. All four of them have their own set of global variables, which means that all 4 Processes have global variables named "Result_1", "Result_2" and "Result_3". So each Process works with its own version of these objects, and they are not the same object. This is far from obvious when you just read the source code, and it definitely takes a while to wrap your head around this concept.

When Process p1 modifies Result_1, it modifies its own instance of that list. It’s a different object than the one used by your main Process, even though at the source code level they both have the same name. When you look at the contents of Result_1 in your main Process, it is empty. That’s because your main Process doesn’t know what Process p1 did. Sharing data between Processes is not a trivial problem. The Python standard library has some tools for this, but they must be used carefully.

Multithreading is different. Threads share a memory space, so the solution presented by codester_09 works. There is only one list named Result_1. When the secondary thread modifies it, the main thread can access the modified data immediately. No problem. However, his solution does not accomplish your stated goal of calculating all three functions simultaneously. With threading, Python creates the illusion of multitasking by switching rapidly from one thread to another. You can easily verify this by adding the following 5 lines to codester_09’s listing:

t0 = time.time()
Calculation_1()
Calculation_2()
Calculation_3()
print(time.time() - t0)

This will run the three calculations sequentially, one after the other, and takes just as long as the threaded version on my machine (Win10).

The following program utilizes shared memory arrays, part of the multiprocessing module. Three such arrays are created and passed to the secondary Processes. I inserted a print statement to prove that the arrays are updated.

The program’s overall execution time is less than half of the sequential version, which is your stated goal. The execution speedup is not three times, as you might expect, due to some system-level complexities of using shared memory (I think). But it’s distinctly faster than the threaded version and it works.

import multiprocessing as mp
import time

LENGTH = 10000000

def Calculation_1(x):
    for n in range(LENGTH):
        x[n] = n ** 0.5
    print("C1 finished")

def Calculation_2(x):
    for n in range(LENGTH):
        x[n] = n ** 2
    print("C2 finished")

def Calculation_3(x):
    for n in range(LENGTH):
        x[n] = n ** 3
    print("C3 finished")
    
def main():
    starttime = time.time()
    x1 = mp.Array("d", LENGTH, lock=False)
    x2 = mp.Array("d", LENGTH, lock=False)
    x3 = mp.Array("d", LENGTH, lock=False)

    p1 = mp.Process(target = Calculation_1, args=(x1,))
    p2 = mp.Process(target = Calculation_2, args=(x2,))
    p3 = mp.Process(target = Calculation_3, args=(x3,))

    p1.start()
    p2.start()
    p3.start()

    p1.join()
    p2.join()
    p3.join()

    for x in (x1, x2, x3):
        print(x[1], x[-1], len(x))

    endtime = time.time()
    print("Time =", "{:.2f}".format((endtime - starttime) * (10 ** 3)), "ms")

if __name__ == "__main__":
    main()
Answered By: Paul Cornelius