Finding an efficient way to distribute elements of a list between multiple processes in python

Question:

Suppose I have a list of strings like so:

a = ['string1', 'string2', ..., 'stringN']

Was wondering what could be the most efficient way to make use of "8" processes using these "N" strings in the said list as an argument for a "single" function?

For better understanding consider the function like this:

def some_function(a_string: str):
    f = open("my_enemy_list!.txt", "a")
    f.write(a_string)
    f.close()

Rest assured I’ll use some mutex lock for protecting against the step overs.

(in Linux and not a very large or small list size)

Asked By: John

||

Answers:

This should be enough to get you started. If the processes need to interact, it becomes a bit more interesting, but for an embarrassingly parallel problem this is all you need.

from concurrent.futures import ProcessPoolExecutor


def load_function(item):
    print(item)


if __name__ == '__main__':
    SAMPLE_DATA = [i for i in range(25)]
    with ProcessPoolExecutor(8) as ppe:
        ppe.map(load_function, SAMPLE_DATA)
Answered By: Carbon

The simplest way is to pass that list directly to Pool.map(), and let it split up the list into chunks and distribute them to workers.

This doesn’t assign the chunks to specific workers all at once, but that’s usually undesirable anyway. You want the workers to be flexible, and to request more work if they finish their work earlier than expected. The more constraints you put on which process can do which work, the slower your code will be.

But why break the list into chunks? Why not give each worker one item, then give them more when they finish? The reason for this is IPC (inter-process communication) overhead. Every time a worker needs to request more work, and wake up the parent process, that takes some time, and the worker is idle during that.

By default, chunksize is set to the number of items, divided by the number of processes, divided by 4. In your case, that would lead to dividing your list into 32 parts. This is normally a pretty good default, but you can do better in some cases by tuning it. I usually run my program with different chunksize values, and pick the one which is fastest.

Answered By: Nick ODell

You did not specify what platform you are running under or how large the data string being written will be. If you are running under Linux and the data is not too large, explicit locking is not necessary. See this post. But whether the writing is atomic or you have to do explicit locking, since all processes would be writing to the same file, there is no real parallelization being accomplished in outputting the data. For that reason I would find it simpler to have a single writer.

If the order in which the strings do not matter, I would use the following code:

from multiprocessing import Pool, Queue, cpu_count

def init_pool_processes(q: Queue) -> None:
    global queue

    queue = q

def some_function(a_string: str) -> None:
    ... # Perform some CPU-intenive operations yielding result
    result = a_string.upper() + 'n' # for demo purposes
    queue.put(result)


def writer() -> None:
    with open("my_enemy_list!.txt", "w") as f:
        for result in iter(queue.get, None):
            f.write(result)

def main():
    a = [f'string{i}' for i in range(1, 101)]
    queue = Queue()
    with Pool(cpu_count() + 1, initializer=init_pool_processes, initargs=(queue,)) as pool:
        async_result = pool.apply_async(writer)
        pool.map(some_function, a)
        # Tell writer there is no more data coming:
        queue.put(None)
        # Wait for writer to complete:
        async_result.get()

if __name__ == '__main__':
    main()

If the order does matter, then:

from multiprocessing import Pool, cpu_count


def some_function(a_string: str) -> None:
    ... # Perform some CPU-intenive operations yielding result
    result = a_string.upper() + 'n' # for demo purposes
    return result

def compute_chunksize(iterable_size: int, pool_size: int) -> int:
    chunksize, remainder = divmod(iterable_size, 4 * pool_size)
    if remainder:
        chunksize += 1
    return chunksize

def main():
    a = [f'string{i}' for i in range(1, 101)]
    iterable_size = len(a)
    pool_size = cpu_count()
    chunksize = compute_chunksize(iterable_size, pool_size)

    with Pool(pool_size) as pool, 
    open("my_enemy_list!.txt", "w") as f:
        for result in pool.imap_unordered(some_function, a, chunksize=chunksize):
            f.write(result)

if __name__ == '__main__':
    main()
Answered By: Booboo