How to stop subprocesses that communicate with the main process through request and response queues?

Question:

I have a Python program that starts N subprocesses (clients) which send requests to and listen for responses from the main process (server). The interprocess communication uses pipes through multiprocessing.Queue objects according to the following scheme (one queue per consumer, so one request queue and N response queues):

                1 req_queue
                              <-- Process-1
MainProcess <-- ============= <-- …
                              <-- Process-N

                N resp_queues
            --> ============= --> Process-1
MainProcess --> ============= --> …
            --> ============= --> Process-N

The (simplified) program:

import multiprocessing


def work(event, req_queue, resp_queue):
    while not event.is_set():
        name = multiprocessing.current_process().name
        x = 3
        req_queue.put((name, x))
        print(name, 'input:', x)
        y = resp_queue.get()
        print(name, 'output:', y)


if __name__ == '__main__':
    event = multiprocessing.Event()
    req_queue = multiprocessing.Queue()
    resp_queues = {}
    processes = {}
    N = 10
    for _ in range(N):  # start N subprocesses
        resp_queue = multiprocessing.Queue()
        process = multiprocessing.Process(
            target=work, args=(event, req_queue, resp_queue))
        resp_queues[process.name] = resp_queue
        processes[process.name] = process
        process.start()
    for _ in range(100):  # handle 100 requests
        (name, x) = req_queue.get()
        y = x ** 2
        resp_queues[name].put(y)
    event.set()  # stop the subprocesses
    for process in processes.values():
        process.join()

The problem that I am facing is that the execution of this program (under Python 3.11.2) sometimes never stops, hanging at the line y = resp_queue.get() in some subprocess once the main process notify subprocesses to stop at the line event.set(). The problem is the same if I use the threading library instead of the multiprocessing library.

How to stop the subprocesses?

Asked By: Maggyero

||

Answers:

queue.get() is a blocking function, a thread (process) reaching it will wait until an item is put on the queue, it won’t be woken up by setting the event if it has already reached the get() line.

The way that’s usually done (even in the standard modules) is to send a None (or another meaningless object) on the queue to wake the processes waiting on the queue and have them terminate when there is no more work.

event.set()
for queue_obj in resp_queues:
    queue_obj.put(None)

This makes your event only useful for early termination, but if early termination is not needed you can just omit the event from the workers altogether.

def work(event, req_queue, resp_queue):
    while True:
        ...
        y = resp_queue.get()
        print(name, 'output:', y)
        if y is None:
            break

Obviously just using queue.get() can lead to resources leak if the main process fails, so another solution that you should do is to use a timeout on the queue and not leave it waiting forever.

y = resp_queue.get(timeout=0.1)

This makes sure the processes will terminate eventually on "unexpected failures", but sending the None is what’s used for instantaneous termination.

If you have multiple resp_queue.get() embedded throughout your code that a simple break on None won’t work then you can use sys.exit() when you receive the None to terminate the worker, this will do the necessary cleanup and can only be caught be a bare except:, the code that intercepts the None and calls sys.exit can be hidden in a subclass of multiprocessing.queues.Queue.

class MyQueue(multiprocessing.queues.Queue):

    def get(self, block: bool=False, timeout: Optional[float]=None) -> object:
        # set a default alternative timeout if you want
        return_value = super().get(block=block, timeout=timeout)  
        if return_value is None:  # or another dummy class as a signal
            sys.exit()  # or raise a known exception
        return return_value
Answered By: Ahmed AEK

Here is an alternative solution to @AhmedAEK’s which works for an open-ended number of request–response pairs in the work function, i.e. an open-ended number of req_queue.put((name, x)) and y = resp_queue.get() call pairs (in my real program I don’t control the worker code because it can be redefined by the user in a subclass):

...

if __name__ == '__main__':
    ...
    event.set()
    try:
        while True:
            (name, x) = req_queue.get(timeout=1)
            resp_queues[name].put(None)
    except queue.Empty:
        pass
    ...
Answered By: Maggyero