Joblib Parallel doesn't terminate processes

Question

I run the code in parallel in the following fashion:

grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)

After the computation is done I can see all the spawned processes are still active and memory consuming in a system monitor:

And all these processes are not killed till the main process is terminated what leads to memory leak.
If I do the same with multiprocessing.Pool in the following way:

pool = Pool(14)
pool.map(apply_wrapper, np.array_split(groups, 14))
pool.close()
pool.join()

Then I see that all the spawned processed are terminated in the end and no memory is leaked.
However, I need joblib and it’s loky backend since it allows to serialize some local functions.

How can I forcefully kill processes spawned by joblib.Parallel and release memory?
My environment is the following: Python 3.8, Ubuntu Linux.

Asked By: Иван Судос

||

Source

Answer 1

What I can wrap-up after invesigating this myself:

joblib.Parallel is not obliged to terminate processes after successfull
single invocation
Loky backend doesn’t terminate workers physically and it is intentinal design explained by authors: Loky Code Line
If you want explicitly release workers you can use my snippet:

    import psutil
    current_process = psutil.Process()
    subproc_before = set([p.pid for p in current_process.children(recursive=True)])
    grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)
    subproc_after = set([p.pid for p in current_process.children(recursive=True)])
    for subproc in subproc_after - subproc_before:
        print('Killing process with pid {}'.format(subproc))
        psutil.Process(subproc).terminate()

The code above is not thread/process save. If you have another source of spawning subprocesses you should block it’s execution.
Everything is valid for joblib version 1.0.1

Answered By: Иван Судос

Answer 2

So, taking into account point 2 of Иван Судос’s answer, would it be judicious to create a new class wrapped around the class LokyBackend and which overloads the terminate() function?
e.g.,

class MyLokyWrapper(LokyBackend):

    def terminate(self):
        if self._workers is not None:            
            self._workers.terminate(kill_workers=False)
            #if kill_workers, joblib terminates "brutally" the remaining workers 
            #and their descendants using SIGKILL
            self._workers = None

        self.reset_batch_stats()

Answered By: rbtlm640

Joblib Parallel doesn't terminate processes

Question:

Answers: