Empty python process hangs on join [sys.stderr.flush()]
Question:
Python guru I need your help. I faced quite strange behavior:
empty python Process hangs on joining. Looks like it forks some locked resource.
Env:
- Python version: 3.5.3
- OS: Ubuntu 16.04.2 LTS
- Kernel: 4.4.0-75-generic
Problem description:
1) I have a logger with thread to handle messages in background and queue for this thread. Logger source code (a little bit simplified).
2) And I have a simple script which uses my logger (just code to display my problem):
import os
from multiprocessing import Process
from my_logging import get_logger
def func():
pass
if __name__ == '__main__':
logger = get_logger(__name__)
logger.start()
for _ in range(2):
logger.info('message')
proc = Process(target=func)
proc.start()
proc.join(timeout=3)
print('TEST PROCESS JOINED: is_alive={0}'.format(proc.is_alive()))
logger.stop()
print('EXIT')
Sometimes this test script hangs. Script hangs on joining process “proc” (when script completes execution). Test process “proc” stay alive.
To reproduce this problem you can run the script in loop:
$ for i in {1..100} ; do /opt/python3.5.3/bin/python3.5 test.py ; done
Investigation:
Strace shows following:
strace: Process 25273 attached
futex(0x2275550, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff
And I figured out the place where process hangs. It hangs in multiprocessing module, file process.py, line 269 (python3.5.3), on flushing STDERR:
...
267 util.info('process exiting with exitcode %d' % exitcode)
268 sys.stdout.flush()
269 sys.stderr.flush()
...
If line 269 commented the script completes successfully always.
My thoughts:
By default logging.StreamHandler uses sys.stderr as stream.
If process has been forked when logger flushing data to STDERR, process context gets some locked resource and further hangs on flushing STDERR.
Some workarounds which solves problem:
- Use python2.7. I can’t reproduce it with python2.7. Maybe timings prevent me to reproduce the problem.
- Use process to handle messages in logger instead of thread.
Do you have any ideas on this behavior? Where is the problem? Am I doing something wrong?
Answers:
Question: Sometimes … Test process “proc” stay alive.
I could only reproduce your
TEST PROCESS:0 JOINED: is_alive=True
by adding a time.sleep(5)
to def func():
.
You use proc.join(timeout=3)
, that’s the expected behavior.
Conclusion:
Overloading your System, starts in my Environment with 30 Processes running, triggers your proc.join(timeout=3)
.
You may rethink your Testcase to reproduce your problem.
One Approach I think, is fine-tuning your Process/Thread
with some time.sleep(0.05)
to give off a timeslice.
-
Your are using from multiprocessing import Queue
use from queue import Queue
instead.
From the Documentation
Class multiprocessing.Queue
A queue class for use in a multi-processing (rather than multi-threading) context.
-
In class QueueHandler(logging.Handler):
, prevent to do
self.queue.put_nowait(record)
after
class QueueListener(object):
...
def stop(self):
...
implement, for instance
class QueueHandler(logging.Handler):
def __init__(self):
self.stop = Event()
...
-
In def _monitor(self):
use only ONE while ...
loop.
Wait until the self._thread
stoped
class QueueListener(object):
...
def stop(self):
self.handler.stop.set()
while not self.queue.empty():
time.sleep(0.5)
# Don't use double flags
#self._stop.set()
self.queue.put_nowait(self._sentinel)
self._thread.join()
It looks like this behaviour is related to this issue: http://bugs.python.org/issue6721
The same can be seen with the following snippet:
#!/usr/bin/env python
import logging
import multiprocessing
import threading
print("start")
logging.basicConfig()
logger = logging.getLogger("main")
def thread_func():
for i in range(100):
logger.warning("log from thread %d", i)
print("thread end")
def proc_func():
pass
print("proc end")
print("main func")
thr = threading.Thread(target=thread_func)
prc = multiprocessing.Process(target=proc_func)
thr.start()
prc.start()
thr.join()
print("join1")
prc.join()
print("join2")
print("main func end")
prc
never joins. It also waits in flush on futex. Tested with cpythons: 3.9.2, 3.9.16, 3.10.10, 3.11.2. And I cannot reproduce it in cpython-2.7.18.
Python guru I need your help. I faced quite strange behavior:
empty python Process hangs on joining. Looks like it forks some locked resource.
Env:
- Python version: 3.5.3
- OS: Ubuntu 16.04.2 LTS
- Kernel: 4.4.0-75-generic
Problem description:
1) I have a logger with thread to handle messages in background and queue for this thread. Logger source code (a little bit simplified).
2) And I have a simple script which uses my logger (just code to display my problem):
import os
from multiprocessing import Process
from my_logging import get_logger
def func():
pass
if __name__ == '__main__':
logger = get_logger(__name__)
logger.start()
for _ in range(2):
logger.info('message')
proc = Process(target=func)
proc.start()
proc.join(timeout=3)
print('TEST PROCESS JOINED: is_alive={0}'.format(proc.is_alive()))
logger.stop()
print('EXIT')
Sometimes this test script hangs. Script hangs on joining process “proc” (when script completes execution). Test process “proc” stay alive.
To reproduce this problem you can run the script in loop:
$ for i in {1..100} ; do /opt/python3.5.3/bin/python3.5 test.py ; done
Investigation:
Strace shows following:
strace: Process 25273 attached
futex(0x2275550, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff
And I figured out the place where process hangs. It hangs in multiprocessing module, file process.py, line 269 (python3.5.3), on flushing STDERR:
...
267 util.info('process exiting with exitcode %d' % exitcode)
268 sys.stdout.flush()
269 sys.stderr.flush()
...
If line 269 commented the script completes successfully always.
My thoughts:
By default logging.StreamHandler uses sys.stderr as stream.
If process has been forked when logger flushing data to STDERR, process context gets some locked resource and further hangs on flushing STDERR.
Some workarounds which solves problem:
- Use python2.7. I can’t reproduce it with python2.7. Maybe timings prevent me to reproduce the problem.
- Use process to handle messages in logger instead of thread.
Do you have any ideas on this behavior? Where is the problem? Am I doing something wrong?
Question: Sometimes … Test process “proc” stay alive.
I could only reproduce your
TEST PROCESS:0 JOINED: is_alive=True
by adding a
time.sleep(5)
todef func():
.
You useproc.join(timeout=3)
, that’s the expected behavior.Conclusion:
Overloading your System, starts in my Environment with 30 Processes running, triggers yourproc.join(timeout=3)
.
You may rethink your Testcase to reproduce your problem.One Approach I think, is fine-tuning your
Process/Thread
with sometime.sleep(0.05)
to give off a timeslice.
-
Your are using
from multiprocessing import Queue
usefrom queue import Queue
instead.From the Documentation
Class multiprocessing.Queue
A queue class for use in a multi-processing (rather than multi-threading) context. -
In
class QueueHandler(logging.Handler):
, prevent to doself.queue.put_nowait(record)
after
class QueueListener(object): ... def stop(self): ...
implement, for instance
class QueueHandler(logging.Handler): def __init__(self): self.stop = Event() ...
-
In
def _monitor(self):
use only ONEwhile ...
loop.
Wait until theself._thread
stopedclass QueueListener(object): ... def stop(self): self.handler.stop.set() while not self.queue.empty(): time.sleep(0.5) # Don't use double flags #self._stop.set() self.queue.put_nowait(self._sentinel) self._thread.join()
It looks like this behaviour is related to this issue: http://bugs.python.org/issue6721
The same can be seen with the following snippet:
#!/usr/bin/env python
import logging
import multiprocessing
import threading
print("start")
logging.basicConfig()
logger = logging.getLogger("main")
def thread_func():
for i in range(100):
logger.warning("log from thread %d", i)
print("thread end")
def proc_func():
pass
print("proc end")
print("main func")
thr = threading.Thread(target=thread_func)
prc = multiprocessing.Process(target=proc_func)
thr.start()
prc.start()
thr.join()
print("join1")
prc.join()
print("join2")
print("main func end")
prc
never joins. It also waits in flush on futex. Tested with cpythons: 3.9.2, 3.9.16, 3.10.10, 3.11.2. And I cannot reproduce it in cpython-2.7.18.