Finding an efficient way to distribute elements of a list between multiple processes in python
Question:
Suppose I have a list of strings like so:
a = ['string1', 'string2', ..., 'stringN']
Was wondering what could be the most efficient way to make use of "8" processes using these "N" strings in the said list as an argument for a "single" function?
For better understanding consider the function like this:
def some_function(a_string: str):
f = open("my_enemy_list!.txt", "a")
f.write(a_string)
f.close()
Rest assured I’ll use some mutex lock for protecting against the step overs.
(in Linux and not a very large or small list size)
Answers:
This should be enough to get you started. If the processes need to interact, it becomes a bit more interesting, but for an embarrassingly parallel problem this is all you need.
from concurrent.futures import ProcessPoolExecutor
def load_function(item):
print(item)
if __name__ == '__main__':
SAMPLE_DATA = [i for i in range(25)]
with ProcessPoolExecutor(8) as ppe:
ppe.map(load_function, SAMPLE_DATA)
The simplest way is to pass that list directly to Pool.map(), and let it split up the list into chunks and distribute them to workers.
This doesn’t assign the chunks to specific workers all at once, but that’s usually undesirable anyway. You want the workers to be flexible, and to request more work if they finish their work earlier than expected. The more constraints you put on which process can do which work, the slower your code will be.
But why break the list into chunks? Why not give each worker one item, then give them more when they finish? The reason for this is IPC (inter-process communication) overhead. Every time a worker needs to request more work, and wake up the parent process, that takes some time, and the worker is idle during that.
By default, chunksize is set to the number of items, divided by the number of processes, divided by 4. In your case, that would lead to dividing your list into 32 parts. This is normally a pretty good default, but you can do better in some cases by tuning it. I usually run my program with different chunksize values, and pick the one which is fastest.
You did not specify what platform you are running under or how large the data string being written will be. If you are running under Linux and the data is not too large, explicit locking is not necessary. See this post. But whether the writing is atomic or you have to do explicit locking, since all processes would be writing to the same file, there is no real parallelization being accomplished in outputting the data. For that reason I would find it simpler to have a single writer.
If the order in which the strings do not matter, I would use the following code:
from multiprocessing import Pool, Queue, cpu_count
def init_pool_processes(q: Queue) -> None:
global queue
queue = q
def some_function(a_string: str) -> None:
... # Perform some CPU-intenive operations yielding result
result = a_string.upper() + 'n' # for demo purposes
queue.put(result)
def writer() -> None:
with open("my_enemy_list!.txt", "w") as f:
for result in iter(queue.get, None):
f.write(result)
def main():
a = [f'string{i}' for i in range(1, 101)]
queue = Queue()
with Pool(cpu_count() + 1, initializer=init_pool_processes, initargs=(queue,)) as pool:
async_result = pool.apply_async(writer)
pool.map(some_function, a)
# Tell writer there is no more data coming:
queue.put(None)
# Wait for writer to complete:
async_result.get()
if __name__ == '__main__':
main()
If the order does matter, then:
from multiprocessing import Pool, cpu_count
def some_function(a_string: str) -> None:
... # Perform some CPU-intenive operations yielding result
result = a_string.upper() + 'n' # for demo purposes
return result
def compute_chunksize(iterable_size: int, pool_size: int) -> int:
chunksize, remainder = divmod(iterable_size, 4 * pool_size)
if remainder:
chunksize += 1
return chunksize
def main():
a = [f'string{i}' for i in range(1, 101)]
iterable_size = len(a)
pool_size = cpu_count()
chunksize = compute_chunksize(iterable_size, pool_size)
with Pool(pool_size) as pool,
open("my_enemy_list!.txt", "w") as f:
for result in pool.imap_unordered(some_function, a, chunksize=chunksize):
f.write(result)
if __name__ == '__main__':
main()
Suppose I have a list of strings like so:
a = ['string1', 'string2', ..., 'stringN']
Was wondering what could be the most efficient way to make use of "8" processes using these "N" strings in the said list as an argument for a "single" function?
For better understanding consider the function like this:
def some_function(a_string: str):
f = open("my_enemy_list!.txt", "a")
f.write(a_string)
f.close()
Rest assured I’ll use some mutex lock for protecting against the step overs.
(in Linux and not a very large or small list size)
This should be enough to get you started. If the processes need to interact, it becomes a bit more interesting, but for an embarrassingly parallel problem this is all you need.
from concurrent.futures import ProcessPoolExecutor
def load_function(item):
print(item)
if __name__ == '__main__':
SAMPLE_DATA = [i for i in range(25)]
with ProcessPoolExecutor(8) as ppe:
ppe.map(load_function, SAMPLE_DATA)
The simplest way is to pass that list directly to Pool.map(), and let it split up the list into chunks and distribute them to workers.
This doesn’t assign the chunks to specific workers all at once, but that’s usually undesirable anyway. You want the workers to be flexible, and to request more work if they finish their work earlier than expected. The more constraints you put on which process can do which work, the slower your code will be.
But why break the list into chunks? Why not give each worker one item, then give them more when they finish? The reason for this is IPC (inter-process communication) overhead. Every time a worker needs to request more work, and wake up the parent process, that takes some time, and the worker is idle during that.
By default, chunksize is set to the number of items, divided by the number of processes, divided by 4. In your case, that would lead to dividing your list into 32 parts. This is normally a pretty good default, but you can do better in some cases by tuning it. I usually run my program with different chunksize values, and pick the one which is fastest.
You did not specify what platform you are running under or how large the data string being written will be. If you are running under Linux and the data is not too large, explicit locking is not necessary. See this post. But whether the writing is atomic or you have to do explicit locking, since all processes would be writing to the same file, there is no real parallelization being accomplished in outputting the data. For that reason I would find it simpler to have a single writer.
If the order in which the strings do not matter, I would use the following code:
from multiprocessing import Pool, Queue, cpu_count
def init_pool_processes(q: Queue) -> None:
global queue
queue = q
def some_function(a_string: str) -> None:
... # Perform some CPU-intenive operations yielding result
result = a_string.upper() + 'n' # for demo purposes
queue.put(result)
def writer() -> None:
with open("my_enemy_list!.txt", "w") as f:
for result in iter(queue.get, None):
f.write(result)
def main():
a = [f'string{i}' for i in range(1, 101)]
queue = Queue()
with Pool(cpu_count() + 1, initializer=init_pool_processes, initargs=(queue,)) as pool:
async_result = pool.apply_async(writer)
pool.map(some_function, a)
# Tell writer there is no more data coming:
queue.put(None)
# Wait for writer to complete:
async_result.get()
if __name__ == '__main__':
main()
If the order does matter, then:
from multiprocessing import Pool, cpu_count
def some_function(a_string: str) -> None:
... # Perform some CPU-intenive operations yielding result
result = a_string.upper() + 'n' # for demo purposes
return result
def compute_chunksize(iterable_size: int, pool_size: int) -> int:
chunksize, remainder = divmod(iterable_size, 4 * pool_size)
if remainder:
chunksize += 1
return chunksize
def main():
a = [f'string{i}' for i in range(1, 101)]
iterable_size = len(a)
pool_size = cpu_count()
chunksize = compute_chunksize(iterable_size, pool_size)
with Pool(pool_size) as pool,
open("my_enemy_list!.txt", "w") as f:
for result in pool.imap_unordered(some_function, a, chunksize=chunksize):
f.write(result)
if __name__ == '__main__':
main()