Python Using Multiprocessing

Question:

I am trying to use multiprocessing in python 3.6. I have a for loopthat runs a method with different arguments. Currently, it is running one at a time which is taking quite a bit of time so I am trying to use multiprocessing. Here is what I have:

def test(self):
    for key, value in dict.items():
        pool = Pool(processes=(cpu_count() - 1))
        pool.apply_async(self.thread_process, args=(key,value))
        pool.close()
        pool.join()


def thread_process(self, key, value):
    # self.__init__()
    print("For", key)

I think what my code is using 3 processes to run one method but I would like to run 1 method per process but I don’t know how this is done. I am using 4 cores btw.

Asked By: anderish

||

Answers:

You’re making a pool at every iteration of the for loop. Make a pool beforehand, apply the processes you’d like to run in multiprocessing, and then join them:

from multiprocessing import Pool, cpu_count
import time

def t():
    # Make a dummy dictionary
    d = {k: k**2 for k in range(10)}

    pool = Pool(processes=(cpu_count() - 1))

    for key, value in d.items():
        pool.apply_async(thread_process, args=(key, value))

    pool.close()
    pool.join()


def thread_process(key, value):
    time.sleep(0.1)  # Simulate a process taking some time to complete
    print("For", key, value)

if __name__ == '__main__':
    t()

You’re not populating your multiprocessing.Pool with data – you’re re-initializing the pool on each loop. In your case you can use Pool.map() to do all the heavy work for you:

def thread_process(args):
    print(args)

def test():
    pool = Pool(processes=(cpu_count() - 1))
    pool.map(thread_process, your_dict.items())
    pool.close()

if __name__ == "__main__":  # important guard for cross-platform use
    test()

Also, given all those self arguments I reckon you’re snatching this off of a class instance and if so – don’t, unless you know what you’re doing. Since multiprocessing in Python essentially works as, well, multi-processing (unlike multi-threading) you don’t get to share your memory, which means your data is pickled when exchanging between processes, which means anything that cannot be pickled (like instance methods) doesn’t get called. You can read more on that problem on this answer.

Answered By: zwer

I think what my code is using 3 processes to run one method but I would like to run 1 method per process but I don’t know how this is done. I am using 4 cores btw.

No, you are in fact using the correct syntax here to utilize 3 cores to run an arbitrary function independently on each. You cannot magically utilize 3 cores to work together on one task with out explicitly making that a part of the algorithm itself/ coding that your self often using threads (which do not work the same in python as they do outside of the language).

You are however re-initializing the pool every loop you’ll need to do something like this instead to actually perform this properly:

    cpus_to_run_on = cpu_count() - 1
    pool = Pool(processes=(cpus_to_run_on)
    # don't call a dictionary a dict, you will not be able to use dict() any 
    # more after that point, that's like calling a variable len or abs, you 
    # can't use those functions now
    pool.map(your_function, your_function_args)
    pool.close()

Take a look at the python multiprocessing docs for more specific information if you’d like to get a better understanding of how it works. Under python, you cannot utilize threading to do multiprocessing with the default CPython interpreter. This is because of something called the global interpreter lock, which stops concurrent resource access from within python itself. The GIL doesn’t exist in other implementations of the language, and is not something other languages like C and C++ have to deal with (and thus you can actually use threads in parallel to work together on a task, unlike CPython)

Python gets around this issue by simply making multiple interpreter instances when using the multiprocessing module, and any message passing between instances is done via copying data between processes (ie the same memory is typically not touched by both interpreter instances). This does not however happen in the misleadingly named threading module, which often actually slow processes down because of a process called context switching. Threading today has limited usefullness, but provides an easier way around non GIL locked processes like socket and file reads/writes than async python.

Beyond all this though there is a bigger problem with your multiprocessing. Your writing to standard output. You aren’t going to get the gains you want. Think about it. Each of your processes “print” data, but its all being displayed in one terminal/output screen. So even if your processes are “printing” they aren’t really doing that independently, and the information has to be coalesced back into another processes where the text interface lies (ie your console). So these processes write whatever they were going to to some sort of buffer, which then has to be copied (as we learned from how multiprocessing works) to another process which will then take that buffered data and output it.

Typically dummy programs use printing as a means of showing how there is no order between execution of these processes, that they can finish at different times, they aren’t meant to demonstrate the performance benefits of multi core processing.

Answered By: Krupip

I have experimented a bit this week with multiprocessing. The fastest way that I discovered to do multiprocessing in python3 is using imap_unordered, at least in my scenario. Here is a script you can experiment with using your scenario to figure out what works best for you:

import multiprocessing

NUMBER_OF_PROCESSES = multiprocessing.cpu_count()
MP_FUNCTION = 'imap_unordered'  # 'imap_unordered' or 'starmap' or 'apply_async'

def process_chunk(a_chunk):
    print(f"processig mp chunk {a_chunk}")
    return a_chunk


map_jobs = [1, 2, 3, 4]

result_sum = 0

if MP_FUNCTION == 'imap_unordered':
    pool = multiprocessing.Pool(processes=NUMBER_OF_PROCESSES)
    for i in pool.imap_unordered(process_chunk, map_jobs):
        result_sum += i
elif MP_FUNCTION == 'starmap':
    pool = multiprocessing.Pool(processes=NUMBER_OF_PROCESSES)
    try:
        map_jobs = [(i, ) for i in map_jobs]
        result_sum = pool.starmap(process_chunk, map_jobs)
        result_sum = sum(result_sum)
    finally:
        pool.close()
        pool.join()
elif MP_FUNCTION == 'apply_async':
    with multiprocessing.Pool(processes=NUMBER_OF_PROCESSES) as pool:
        result_sum = [pool.apply_async(process_chunk, [i, ]).get() for i in map_jobs]
    result_sum = sum(result_sum)
print(f"result_sum is {result_sum}")

I found that starmap was not too far behind in performance, in my scenario it used more cpu and ended up being a bit slower. Hope this boilerplate helps.

Answered By: radtek

minimal working example: ordered execution, results asap, max n threads

The following code snippets executes some functions in parallel with the following side conditions:

  • max as many threads as cpu cores
  • jobs can be executed by priority
  • results are printed as soon as available

Code

import time
import random
from typing import List, Callable, Dict, Any
import multiprocessing as mp
from multiprocessing.managers import DictProxy

N_THREADS = mp.cpu_count()

def process_single_task(task_name: str):
    n_sec = random.randint(0, 4)
    print(f"start {task_name=}, {n_sec=}")
    time.sleep(n_sec)
    print(f"end {task_name=}, {n_sec=}")
    return task_name, n_sec

def fct_to_multiprocessing(
        fct: Callable, fct_kwargs: Dict[str, Any], job_id: int, results: DictProxy, semaphore: mp.Semaphore):
    if semaphore is not None:
        semaphore.acquire()
    results[job_id] = fct(**fct_kwargs)
    if semaphore is not None:
        semaphore.release()

def process_all_tasks(tasks: List[str]):
    manager = mp.Manager()
    results = manager.dict()  # <class 'multiprocessing.managers.DictProxy'>
    sema = mp.Semaphore(N_THREADS)
    jobs = {}

    job_ids = list(range(len(tasks)))

    for job_id in job_ids:
        task = tasks[job_id]
        jobs[job_id] = mp.Process(
            target=fct_to_multiprocessing,
            kwargs={
                "fct": process_single_task, "fct_kwargs": {"task_name": task},
                "job_id": job_id, "results": results, "semaphore": sema
            }
        )
        jobs[job_id].start()

    for job_id in job_ids:
        job = jobs[job_id]
        job.join()
        result = results[job_id]
        print(f"job {tasks[job_id]} returned {result=}")


if __name__ == "__main__":
    tasks = list("abcdefghijklmnopqrstuvwxyz")
    process_all_tasks(tasks)

Output

start task_name='a', n_sec=4
start task_name='c', n_sec=2
end task_name='c', n_sec=2
start task_name='b', n_sec=2
end task_name='a', n_sec=4
start task_name='d', n_sec=1
job a returned result=('a', 4)
end task_name='b', n_sec=2
start task_name='e', n_sec=0
end task_name='e', n_sec=0
job b returned result=('b', 2)
job c returned result=('c', 2)
start task_name='f', n_sec=0
end task_name='f', n_sec=0
start task_name='j', n_sec=2
end task_name='d', n_sec=1
start task_name='g', n_sec=1
job d returned result=('d', 1)
job e returned result=('e', 0)
job f returned result=('f', 0)
end task_name='g', n_sec=1
start task_name='i', n_sec=3
job g returned result=('g', 1)
end task_name='j', n_sec=2
start task_name='h', n_sec=1
end task_name='h', n_sec=1
start task_name='o', n_sec=4
job h returned result=('h', 1)
end task_name='i', n_sec=3
start task_name='n', n_sec=2
job i returned result=('i', 3)
job j returned result=('j', 2)
end task_name='n', n_sec=2
start task_name='k', n_sec=2
end task_name='o', n_sec=4
start task_name='r', n_sec=1
end task_name='r', n_sec=1
start task_name='m', n_sec=1
end task_name='k', n_sec=2
start task_name='l', n_sec=4
job k returned result=('k', 2)
end task_name='m', n_sec=1
start task_name='s', n_sec=3
end task_name='s', n_sec=3
start task_name='p', n_sec=3
end task_name='l', n_sec=4
start task_name='q', n_sec=0
end task_name='q', n_sec=0
start task_name='t', n_sec=0
end task_name='t', n_sec=0
job l returned result=('l', 4)
job m returned result=('m', 1)
job n returned result=('n', 2)
job o returned result=('o', 4)
start task_name='u', n_sec=4
end task_name='p', n_sec=3
start task_name='v', n_sec=0
end task_name='v', n_sec=0
start task_name='x', n_sec=4
job p returned result=('p', 3)
job q returned result=('q', 0)
job r returned result=('r', 1)
job s returned result=('s', 3)
job t returned result=('t', 0)
end task_name='u', n_sec=4
start task_name='y', n_sec=4
job u returned result=('u', 4)
job v returned result=('v', 0)
end task_name='x', n_sec=4
start task_name='z', n_sec=0
end task_name='z', n_sec=0
start task_name='w', n_sec=1
end task_name='w', n_sec=1
job w returned result=('w', 1)
job x returned result=('x', 4)
end task_name='y', n_sec=4
job y returned result=('y', 4)
job z returned result=('z', 0)


** Process exited - Return Code: 0 **
Press Enter to exit terminal

Disclaimer: time.sleep(n_sec) thereby stands for some computational heavy function. If its actually just waiting, asyncio is in general a better choice (even though increasing the number of threads here should do the job as well).

Answered By: Markus Dutschke
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.