How to stop subprocesses that communicate with the main process through request and response queues?
Question:
I have a Python program that starts N subprocesses (clients) which send requests to and listen for responses from the main process (server). The interprocess communication uses pipes through multiprocessing.Queue
objects according to the following scheme (one queue per consumer, so one request queue and N response queues):
1 req_queue
<-- Process-1
MainProcess <-- ============= <-- …
<-- Process-N
N resp_queues
--> ============= --> Process-1
MainProcess --> ============= --> …
--> ============= --> Process-N
The (simplified) program:
import multiprocessing
def work(event, req_queue, resp_queue):
while not event.is_set():
name = multiprocessing.current_process().name
x = 3
req_queue.put((name, x))
print(name, 'input:', x)
y = resp_queue.get()
print(name, 'output:', y)
if __name__ == '__main__':
event = multiprocessing.Event()
req_queue = multiprocessing.Queue()
resp_queues = {}
processes = {}
N = 10
for _ in range(N): # start N subprocesses
resp_queue = multiprocessing.Queue()
process = multiprocessing.Process(
target=work, args=(event, req_queue, resp_queue))
resp_queues[process.name] = resp_queue
processes[process.name] = process
process.start()
for _ in range(100): # handle 100 requests
(name, x) = req_queue.get()
y = x ** 2
resp_queues[name].put(y)
event.set() # stop the subprocesses
for process in processes.values():
process.join()
The problem that I am facing is that the execution of this program (under Python 3.11.2) sometimes never stops, hanging at the line y = resp_queue.get()
in some subprocess once the main process notify subprocesses to stop at the line event.set()
. The problem is the same if I use the threading
library instead of the multiprocessing
library.
How to stop the subprocesses?
Answers:
queue.get()
is a blocking function, a thread (process) reaching it will wait until an item is put on the queue, it won’t be woken up by setting the event if it has already reached the get()
line.
The way that’s usually done (even in the standard modules) is to send a None
(or another meaningless object) on the queue to wake the processes waiting on the queue and have them terminate when there is no more work.
event.set()
for queue_obj in resp_queues:
queue_obj.put(None)
This makes your event only useful for early termination, but if early termination is not needed you can just omit the event from the workers altogether.
def work(event, req_queue, resp_queue):
while True:
...
y = resp_queue.get()
print(name, 'output:', y)
if y is None:
break
Obviously just using queue.get()
can lead to resources leak if the main process fails, so another solution that you should do is to use a timeout on the queue and not leave it waiting forever.
y = resp_queue.get(timeout=0.1)
This makes sure the processes will terminate eventually on "unexpected failures", but sending the None
is what’s used for instantaneous termination.
If you have multiple resp_queue.get()
embedded throughout your code that a simple break on None
won’t work then you can use sys.exit()
when you receive the None
to terminate the worker, this will do the necessary cleanup and can only be caught be a bare except:
, the code that intercepts the None
and calls sys.exit
can be hidden in a subclass of multiprocessing.queues.Queue
.
class MyQueue(multiprocessing.queues.Queue):
def get(self, block: bool=False, timeout: Optional[float]=None) -> object:
# set a default alternative timeout if you want
return_value = super().get(block=block, timeout=timeout)
if return_value is None: # or another dummy class as a signal
sys.exit() # or raise a known exception
return return_value
Here is an alternative solution to @AhmedAEK’s which works for an open-ended number of request–response pairs in the work
function, i.e. an open-ended number of req_queue.put((name, x))
and y = resp_queue.get()
call pairs (in my real program I don’t control the worker code because it can be redefined by the user in a subclass):
...
if __name__ == '__main__':
...
event.set()
try:
while True:
(name, x) = req_queue.get(timeout=1)
resp_queues[name].put(None)
except queue.Empty:
pass
...
I have a Python program that starts N subprocesses (clients) which send requests to and listen for responses from the main process (server). The interprocess communication uses pipes through multiprocessing.Queue
objects according to the following scheme (one queue per consumer, so one request queue and N response queues):
1 req_queue
<-- Process-1
MainProcess <-- ============= <-- …
<-- Process-N
N resp_queues
--> ============= --> Process-1
MainProcess --> ============= --> …
--> ============= --> Process-N
The (simplified) program:
import multiprocessing
def work(event, req_queue, resp_queue):
while not event.is_set():
name = multiprocessing.current_process().name
x = 3
req_queue.put((name, x))
print(name, 'input:', x)
y = resp_queue.get()
print(name, 'output:', y)
if __name__ == '__main__':
event = multiprocessing.Event()
req_queue = multiprocessing.Queue()
resp_queues = {}
processes = {}
N = 10
for _ in range(N): # start N subprocesses
resp_queue = multiprocessing.Queue()
process = multiprocessing.Process(
target=work, args=(event, req_queue, resp_queue))
resp_queues[process.name] = resp_queue
processes[process.name] = process
process.start()
for _ in range(100): # handle 100 requests
(name, x) = req_queue.get()
y = x ** 2
resp_queues[name].put(y)
event.set() # stop the subprocesses
for process in processes.values():
process.join()
The problem that I am facing is that the execution of this program (under Python 3.11.2) sometimes never stops, hanging at the line y = resp_queue.get()
in some subprocess once the main process notify subprocesses to stop at the line event.set()
. The problem is the same if I use the threading
library instead of the multiprocessing
library.
How to stop the subprocesses?
queue.get()
is a blocking function, a thread (process) reaching it will wait until an item is put on the queue, it won’t be woken up by setting the event if it has already reached the get()
line.
The way that’s usually done (even in the standard modules) is to send a None
(or another meaningless object) on the queue to wake the processes waiting on the queue and have them terminate when there is no more work.
event.set()
for queue_obj in resp_queues:
queue_obj.put(None)
This makes your event only useful for early termination, but if early termination is not needed you can just omit the event from the workers altogether.
def work(event, req_queue, resp_queue):
while True:
...
y = resp_queue.get()
print(name, 'output:', y)
if y is None:
break
Obviously just using queue.get()
can lead to resources leak if the main process fails, so another solution that you should do is to use a timeout on the queue and not leave it waiting forever.
y = resp_queue.get(timeout=0.1)
This makes sure the processes will terminate eventually on "unexpected failures", but sending the None
is what’s used for instantaneous termination.
If you have multiple resp_queue.get()
embedded throughout your code that a simple break on None
won’t work then you can use sys.exit()
when you receive the None
to terminate the worker, this will do the necessary cleanup and can only be caught be a bare except:
, the code that intercepts the None
and calls sys.exit
can be hidden in a subclass of multiprocessing.queues.Queue
.
class MyQueue(multiprocessing.queues.Queue):
def get(self, block: bool=False, timeout: Optional[float]=None) -> object:
# set a default alternative timeout if you want
return_value = super().get(block=block, timeout=timeout)
if return_value is None: # or another dummy class as a signal
sys.exit() # or raise a known exception
return return_value
Here is an alternative solution to @AhmedAEK’s which works for an open-ended number of request–response pairs in the work
function, i.e. an open-ended number of req_queue.put((name, x))
and y = resp_queue.get()
call pairs (in my real program I don’t control the worker code because it can be redefined by the user in a subclass):
...
if __name__ == '__main__':
...
event.set()
try:
while True:
(name, x) = req_queue.get(timeout=1)
resp_queues[name].put(None)
except queue.Empty:
pass
...