Joblib Parallel doesn't terminate processes
Question:
I run the code in parallel in the following fashion:
grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)
After the computation is done I can see all the spawned processes are still active and memory consuming in a system monitor:
And all these processes are not killed till the main process is terminated what leads to memory leak.
If I do the same with multiprocessing.Pool in the following way:
pool = Pool(14)
pool.map(apply_wrapper, np.array_split(groups, 14))
pool.close()
pool.join()
Then I see that all the spawned processed are terminated in the end and no memory is leaked.
However, I need joblib and it’s loky backend since it allows to serialize some local functions.
How can I forcefully kill processes spawned by joblib.Parallel and release memory?
My environment is the following: Python 3.8, Ubuntu Linux.
Answers:
What I can wrap-up after invesigating this myself:
- joblib.Parallel is not obliged to terminate processes after successfull
single invocation
- Loky backend doesn’t terminate workers physically and it is intentinal design explained by authors: Loky Code Line
- If you want explicitly release workers you can use my snippet:
import psutil
current_process = psutil.Process()
subproc_before = set([p.pid for p in current_process.children(recursive=True)])
grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)
subproc_after = set([p.pid for p in current_process.children(recursive=True)])
for subproc in subproc_after - subproc_before:
print('Killing process with pid {}'.format(subproc))
psutil.Process(subproc).terminate()
- The code above is not thread/process save. If you have another source of spawning subprocesses you should block it’s execution.
- Everything is valid for joblib version 1.0.1
So, taking into account point 2 of Иван Судос’s answer, would it be judicious to create a new class wrapped around the class LokyBackend and which overloads the terminate() function?
e.g.,
class MyLokyWrapper(LokyBackend):
def terminate(self):
if self._workers is not None:
self._workers.terminate(kill_workers=False)
#if kill_workers, joblib terminates "brutally" the remaining workers
#and their descendants using SIGKILL
self._workers = None
self.reset_batch_stats()
I run the code in parallel in the following fashion:
grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)
After the computation is done I can see all the spawned processes are still active and memory consuming in a system monitor:
And all these processes are not killed till the main process is terminated what leads to memory leak.
If I do the same with multiprocessing.Pool in the following way:
pool = Pool(14)
pool.map(apply_wrapper, np.array_split(groups, 14))
pool.close()
pool.join()
Then I see that all the spawned processed are terminated in the end and no memory is leaked.
However, I need joblib and it’s loky backend since it allows to serialize some local functions.
How can I forcefully kill processes spawned by joblib.Parallel and release memory?
My environment is the following: Python 3.8, Ubuntu Linux.
What I can wrap-up after invesigating this myself:
- joblib.Parallel is not obliged to terminate processes after successfull
single invocation - Loky backend doesn’t terminate workers physically and it is intentinal design explained by authors: Loky Code Line
- If you want explicitly release workers you can use my snippet:
import psutil
current_process = psutil.Process()
subproc_before = set([p.pid for p in current_process.children(recursive=True)])
grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)
subproc_after = set([p.pid for p in current_process.children(recursive=True)])
for subproc in subproc_after - subproc_before:
print('Killing process with pid {}'.format(subproc))
psutil.Process(subproc).terminate()
- The code above is not thread/process save. If you have another source of spawning subprocesses you should block it’s execution.
- Everything is valid for joblib version 1.0.1
So, taking into account point 2 of Иван Судос’s answer, would it be judicious to create a new class wrapped around the class LokyBackend and which overloads the terminate() function?
e.g.,
class MyLokyWrapper(LokyBackend):
def terminate(self):
if self._workers is not None:
self._workers.terminate(kill_workers=False)
#if kill_workers, joblib terminates "brutally" the remaining workers
#and their descendants using SIGKILL
self._workers = None
self.reset_batch_stats()