Python Multiprocess good practices

Question:

i am working on a software which is treating a lot of data, so we’ve used multiproccess to accelerate the treatment. here is my actual code :

for file_to_process in file_partition:
    if (len(file_to_process) > 0):
       logger = self.__get_logger__(task.lower(), len(proc_list) + 1)
       proc = Process(target=self.__launch_thread__, args=(task, manager_class, file_to_process, list_action, str_target_path, str_target_buffer_path, logger))
       proc.start()
       proc_list.append(proc)

for proc in proc_list:
    proc.join()

The thing is that we’ve noticed that our procs are not realasing memory after they are finished so i was thinking about doing :

for file_to_process in file_partition:
    if (len(file_to_process) > 0):
       logger = self.__get_logger__(task.lower(), len(proc_list) + 1)
       proc = Process(target=self.__launch_thread__, args=(task, manager_class, file_to_process, list_action, str_target_path, str_target_buffer_path, logger))
       proc.start()
       proc_list.append(proc)

for proc in proc_list:
    if(!(proc.is_alive())):
        proc.close()

I’ve also seen i could use pool but i am not sure about the interest of using it. I am also not really sure about how the join() method works, i’ve read the docs but it seems that procs don’t really wait for each other.

So if someone can enlight me on how to use this multiprocess lib properly that would be great

Asked By: FrozzenFinger

||

Answers:

Looking at the code for Process, it will generate the following error message if the right condition is not met:

Cannot close a process while it is still running. You should first call join() or terminate().

So simply modify your existing code to be:

for proc in proc_list:
    proc.join()
    proc.close()

Whether that will help much is to be decided.

Answered By: Booboo