What exactly is Python multiprocessing Module's .join() Method Doing?

Question:

Learning about Python Multiprocessing (from a PMOTW article) and would love some clarification on what exactly the join() method is doing.

In an old tutorial from 2008 it states that without the p.join() call in the code below, “the child process will sit idle and not terminate, becoming a zombie you must manually kill”.

from multiprocessing import Process

def say_hello(name='world'):
    print "Hello, %s" % name

p = Process(target=say_hello)
p.start()
p.join()

I added a printout of the PID as well as a time.sleep to test and as far as I can tell, the process terminates on its own:

from multiprocessing import Process
import sys
import time

def say_hello(name='world'):
    print "Hello, %s" % name
    print 'Starting:', p.name, p.pid
    sys.stdout.flush()
    print 'Exiting :', p.name, p.pid
    sys.stdout.flush()
    time.sleep(20)

p = Process(target=say_hello)
p.start()
# no p.join()

within 20 seconds:

936 ttys000    0:00.05 /Library/Frameworks/Python.framework/Versions/2.7/Reso
938 ttys000    0:00.00 /Library/Frameworks/Python.framework/Versions/2.7/Reso
947 ttys001    0:00.13 -bash

after 20 seconds:

947 ttys001    0:00.13 -bash

Behavior is the same with p.join() added back at end of the file. Python Module of the Week offers a very readable explanation of the module; “To wait until a process has completed its work and exited, use the join() method.”, but it seems like at least OS X was doing that anyway.

Am also wondering about the name of the method. Is the .join() method concatenating anything here? Is it concatenating a process with it’s end? Or does it just share a name with Python’s native .join() method?

Asked By: MikeiLL

||

Answers:

Without the join(), the main process can complete before the child process does. I’m not sure under what circumstances that leads to zombieism.

The main purpose of join() is to ensure that a child process has completed before the main process does anything that depends on the work of the child process.

The etymology of join() is that it’s the opposite of fork, which is the common term in Unix-family operating systems for creating child processes. A single process “forks” into several, then “joins” back into one.

Answered By: Russell Borogove

The join() method, when used with threading or multiprocessing, is not related to str.join() – it’s not actually concatenating anything together. Rather, it just means “wait for this [thread/process] to complete”. The name join is used because the multiprocessing module’s API is meant to look as similar to the threading module’s API, and the threading module uses join for its Thread object. Using the term join to mean “wait for a thread to complete” is common across many programming languages, so Python just adopted it as well.

Now, the reason you see the 20 second delay both with and without the call to join() is because by default, when the main process is ready to exit, it will implicitly call join() on all running multiprocessing.Process instances. This isn’t as clearly stated in the multiprocessing docs as it should be, but it is mentioned in the Programming Guidelines section:

Remember also that non-daemonic processes will be automatically be
joined.

You can override this behavior by setting the daemon flag on the Process to True prior to starting the process:

p = Process(target=say_hello)
p.daemon = True
p.start()
# Both parent and child will exit here, since the main process has completed.

If you do that, the child process will be terminated as soon as the main process completes:

daemon

The process’s daemon flag, a Boolean value. This must be set before
start() is called.

The initial value is inherited from the creating process.

When a process exits, it attempts to terminate all of its daemonic
child processes.

Answered By: dano

I’m not going to explain in detail what join does, but here’s the etymology and the intuition behind it, which should help you remember its meaning more easily.

The idea is that execution "forks" into multiple processes of which one is the main/primary process, the rest workers (or minor/secondary). When the workers are done, they "join" the main process so that serial execution may be resumed.

The join() causes the main process to wait for a worker to join it. The method might better have been called "wait", since that’s the actual behavior it causes in the master (and that’s what it’s called in POSIX, although POSIX threads call it "join" as well). The joining only occurs as an effect of the threads cooperating properly, it’s not something the main process does.

The names "fork" and "join" have been used with this meaning in multiprocessing since 1963.

Answered By: Fred Foo

join() is used to wait for the worker processes to exit. One must call close() or terminate() before using join().

Like @Russell mentioned join is like the opposite of fork (which Spawns sub-processes).

For join to run you have to run close() which will prevent any more tasks from being submitted to the pool and exit once all tasks complete. Alternatively, running terminate() will just exit by stopping all worker processes immediately.

"the child process will sit idle and not terminate, becoming a zombie you must manually kill" this is possible when the main (parent) process exits but the child process is still running and once completed it has no parent process to return its exit status to.

Answered By: Ani Menon

The join() call ensures that subsequent lines of your code are not called before all the multiprocessing processes are completed.

For example, without the join(), the following code will call restart_program() even before the processes finish, which is similar to asynchronous and is not what we want (you can try):

num_processes = 5

for i in range(num_processes):
    p = multiprocessing.Process(target=calculate_stuff, args=(i,))
    p.start()
    processes.append(p)
for p in processes:
    p.join() # call to ensure subsequent line (e.g. restart_program) 
             # is not called until all processes finish

restart_program()
Answered By: Yi Xiang Chong

To wait until a process has completed its work and exited, use the join() method.

and

Note It is important to join() the process after terminating it in order to give the background machinery time to update the status of the object to reflect the termination.

This is a good example helped me understand it: here

One thing I noticed personally was my main process paused until the child had finished its process using the join() method which defeated the point of me using multiprocessing.Process() in the first place.

Answered By: Josh
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.