Python 3 Multiprocessing queue deadlock when calling join before the queue is empty

Question:

I have a question understanding the queue in the multiprocessing module in python 3

This is what they say in the programming guidelines:

Bear in mind that a process that has put items in a queue will wait before
terminating until all the buffered items are fed by the “feeder” thread to
the underlying pipe. (The child process can call the
Queue.cancel_join_thread
method of the queue to avoid this behaviour.)

This means that whenever you use a queue you need to make sure that all
items which have been put on the queue will eventually be removed before the
process is joined. Otherwise you cannot be sure that processes which have
put items on the queue will terminate. Remember also that non-daemonic
processes will be joined automatically.

An example which will deadlock is the following:

from multiprocessing import Process, Queue

def f(q):
    q.put('X' * 1000000)

if __name__ == '__main__':
    queue = Queue()
    p = Process(target=f, args=(queue,))
    p.start()
    p.join()                    # this deadlocks
    obj = queue.get()

A fix here would be to swap the last two lines (or simply remove the
p.join() line).

So apparently, queue.get() should not be called after a join().

However there are examples of using queues where get is called after a join like:

import multiprocessing as mp
import random
import string

# define a example function
def rand_string(length, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                string.ascii_lowercase
                + string.ascii_uppercase
                + string.digits)
    for i in range(length))
        output.put(rand_str)

 if __name__ == "__main__":
     # Define an output queue
     output = mp.Queue()

     # Setup a list of processes that we want to run
     processes = [mp.Process(target=rand_string, args=(5, output))
                    for x in range(2)]

     # Run processes
    for p in processes:
        p.start()

    # Exit the completed processes
    for p in processes:
        p.join()

    # Get process results from the output queue
    results = [output.get() for p in processes]

    print(results)

I’ve run this program and it works (also posted as a solution to the StackOverFlow question Python 3 – Multiprocessing – Queue.get() does not respond).

Could someone help me understand what the rule for the deadlock is here?

Asked By: markk

||

Answers:

The queue implementation in multiprocessing that allows data to be transferred between processes relies on standard OS pipes.

OS pipes are not infinitely long, so the process which queues data could be blocked in the OS during the put() operation until some other process uses get() to retrieve data from the queue.

For small amounts of data, such as the one in your example, the main process can join() all the spawned subprocesses and then pick up the data. This often works well, but does not scale, and it is not clear when it will break.

But it will certainly break with large amounts of data. The subprocess will be blocked in put() waiting for the main process to remove some data from the queue with get(), but the main process is blocked in join() waiting for the subprocess to finish. This results in a deadlock.

Here is an example where a user had this exact issue. I posted some code in an answer there that helped him solve his problem.

Answered By: Patrick Maupin

Don’t call join() on a process object before you got all messages from the shared queue.

I used following workaround to allow processes to exit before processing all its results:

results = []
while True:
    try:
        result = resultQueue.get(False, 0.01)
        results.append(result)
    except queue.Empty:
        pass
    allExited = True
    for t in processes:
        if t.exitcode is None:
            allExited = False
            break
    if allExited & resultQueue.empty():
        break

It can be shortened but I left it longer to be more clear for newbies.

Here resultQueue is the multiprocess.Queue that was shared with multiprocess.Process objects. After this block of code you will get the result array with all the messages from the queue.

The problem is that input buffer of the queue pipe that receive messages may become full causing writer(s) infinite block until there will be enough space to receive next message. So you have three ways to avoid blocking:

  • Increase the multiprocessing.connection.BUFFER size (not so good)
  • Decrease message size or its amount (not so good)
  • Fetch messages from the queue immediately as they come (good way)
Answered By: Alexander Pravdin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.