How to run two functions in parallel in Python?

Question:

I am trying to run two chunks of code in parallel, but fail to do so. Both functions are for loops that might take around 30 minutes to run in total, these functions in the end return a list of dictionaries as results. Running them separately works fine, but I can’t get them to run in parallel…

#When I run these functions separately, it sort of looks like this:
import time

def functionA(A, B):
    dictA=[]
    for i in list(range(A, B)):
        print(i, "from A")
        time.sleep(1)
        for p in list(range(0, 10)):
            dictA.append({i:p})
    return(dictA)

def functionB(C, D):
    dictB=[]
    for i in list(range(C, D)):
        print(i, "from B")
        time.sleep(1)
        for p in list(range(0, 10)):
            dictB.append({i:p})
    return(dictB)

DictA          = functionA(0, 10)
DictB          = functionB(10, 20)

#I have tried to run them in parallel, but they still run separately, and in the end the variable I would like to write to is just a thread:
import threading
e = threading.Event()
DictA  = threading.Thread(target=functionA(0, 10))
DictA.start()               
DictB  = threading.Thread(target= functionB(10, 20))
DictB.start()
#Further pieces of code to be executed after threading completed

This does not make the process run in parallel but rather in logical sequence as can be seen from the print statements:

0 from A
1 from A
2 from A
3 from A
4 from A
5 from A
6 from A
7 from A
8 from A
9 from A
Exception in thread Thread-12:
Traceback (most recent call last):
  File "/home/maestro/.pyenv/versions/3.9.13/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/maestro/.pyenv/versions/3.9.13/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
TypeError: 'list' object is not callable
10 from B
11 from B
12 from B
13 from B
14 from B
15 from B
16 from B
17 from B
18 from B
19 from B
Exception in thread Thread-13:
Traceback (most recent call last):
  File "/home/maestro/.pyenv/versions/3.9.13/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/maestro/.pyenv/versions/3.9.13/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
TypeError: 'list' object is not callable

How can these functions be run in parallel?

Asked By: Rivered

||

Answers:

Your problem is here:

DictA  = threading.Thread(target=functionA(0, 10))

Here you’re not passing functionA as the argument to target; you’re calling functionA and passing the value to target. That’s why you’re functions appear to be executing synchronously rather than in parallel — because that’s how you’re calling them.

You want:

DictA = threading.Thread(target=functionA, args=(0, 10))

A runnable version of your code looks like this:

import threading
import time


def functionA(A, B):
    dictA = []
    for i in list(range(A, B)):
        print(i, "from A")
        time.sleep(1)
        for p in list(range(0, 10)):
            dictA.append({i: p})
    return dictA


def functionB(C, D):
    dictB = []
    for i in list(range(C, D)):
        print(i, "from B")
        time.sleep(1)
        for p in list(range(0, 10)):
            dictB.append({i: p})
    return dictB


DictA = threading.Thread(target=functionA, args=(0, 10))
DictA.start()
DictB = threading.Thread(target=functionB, args=(10, 20))
DictB.start()

# Calling `.join()` on each thread here to wait for them to complete.
DictA.join()
DictB.join()

Running that code produces as output:

0 from A
10 from B
11 from B
1 from A
2 from A
12 from B
3 from A
13 from B
4 from A
14 from B
5 from A
15 from B
6 from A
16 from B
7 from A
17 from B
18 from B
8 from A
9 from A
19 from B
Answered By: larsks

Regular functions can return values that can be stored in variables.

def check_links_a(links, proxies):
    return f'A links are {links} and A proxies are {proxies}'


def check_links_b(links, proxies):
    return f'B links are {links} and B proxies are {proxies}'

links_to_check_a = 'A links'
links_to_check_b = 'B links'
proxy_list = 'some proxies'

result1 = check_links_a(links_to_check_a, proxy_list)
result2 = check_links_b(links_to_check_b, proxy_list)

print(result1)
print(result2)

result:

A links are A links and A proxies are some proxies
B links are B links and B proxies are some proxies

if you run a function as a thread, the return will be the thread itself and not the return of the function.

def some_stuff_a(stuff):
    for i in range(10):
        print(f'some A {stuff}')
        time.sleep(1)
    return 'A stuff is done'


def some_stuff_b(stuff):
    for i in range(10):
        print(f'some B {stuff}')
        time.sleep(1)
    return 'B stuff is done'

a_stuff = 'from the A stuff var'
b_stuff = 'from the B stuff var'

thread1 = threading.Thread(target=some_stuff_a, args=[a_stuff])
thread1.start()
thread2 = threading.Thread(target=some_stuff_b, args=[b_stuff])
thread2.start()

print(thread1)
print(thread2)

result of the two print(thread1) and print(thread2) statements would be:

<Thread(Thread-1 (some_stuff_a), started 2368)>
<Thread(Thread-2 (some_stuff_b), started 11224)>

As far as I know, the only way to return values from a threaded function is to use global or instance variables because the return will not be forwarded by the start() method of Thread.

The reason why your threads are not running in parallel is that you call the functions instead of passing them to the threading target. Remove the arguments from the function and pass them as args=[arg1, arg2] to the Thread as seen above.

Edit:

Here you have a version using global vars.
Using global variables means, that you have to use already defined variables that are defined outside the functions, as global variables inside the function.
I hope this explains what I mean:

import threading


def some_stuff_a(stuff):
    global data_from_stuff_a
    data_from_stuff_a = 'A stuff is done'


def some_stuff_b(stuff):
    global data_from_stuff_b
    data_from_stuff_b = 'B stuff is done'


a_stuff = 'from the A stuff var'
b_stuff = 'from the B stuff var'

data_from_stuff_a = ''
data_from_stuff_b = ''

thread1 = threading.Thread(target=some_stuff_a, args=[a_stuff])
thread1.start()
thread2 = threading.Thread(target=some_stuff_b, args=[b_stuff])
thread2.start()

print(data_from_stuff_a)
print(data_from_stuff_b)

And this would be a version using instance variables. Instance variables are defined in an object that was created from a class. You can identify them as they have the prefix self.

import threading
import time

class MyClass:
    def __init__(self):
        # instance variables
        self.data_from_stuff_a = ''
        self.data_from_stuff_b = ''

    def run_threads(self):
        # local variables
        a_stuff = 'from the A stuff var'
        b_stuff = 'from the B stuff var'

        thread1 = threading.Thread(target=self.some_stuff_a, args=[a_stuff])
        thread1.start()
        thread2 = threading.Thread(target=self.some_stuff_b, args=[b_stuff])
        thread2.start()
        thread1.join()
        thread2.join()

    def print_results(self):
        self._print_all_stuff()

    def some_stuff_a(self, stuff):
        time.sleep(2)
        self.data_from_stuff_a = 'A stuff is done'

    def some_stuff_b(self, stuff):
        time.sleep(4)
        self.data_from_stuff_b = 'B stuff is done'

    def _print_all_stuff(self):
        print(f'stuff A: {self.data_from_stuff_a}')
        print(f'stuff B: {self.data_from_stuff_b}')


my_object = MyClass()
my_object.run_threads()
my_object.print_results()
Answered By: Ovski