Empty python process hangs on join [sys.stderr.flush()]

Question:

Python guru I need your help. I faced quite strange behavior:
empty python Process hangs on joining. Looks like it forks some locked resource.

Env:

  • Python version: 3.5.3
  • OS: Ubuntu 16.04.2 LTS
  • Kernel: 4.4.0-75-generic

Problem description:

1) I have a logger with thread to handle messages in background and queue for this thread. Logger source code (a little bit simplified).

2) And I have a simple script which uses my logger (just code to display my problem):

import os
from multiprocessing import Process
from my_logging import get_logger


def func():
    pass


if __name__ == '__main__':

    logger = get_logger(__name__)
    logger.start()
    for _ in range(2):
        logger.info('message')

    proc = Process(target=func)
    proc.start()
    proc.join(timeout=3)
    print('TEST PROCESS JOINED: is_alive={0}'.format(proc.is_alive()))

    logger.stop()
    print('EXIT')

Sometimes this test script hangs. Script hangs on joining process “proc” (when script completes execution). Test process “proc” stay alive.

To reproduce this problem you can run the script in loop:

$ for i in {1..100} ; do /opt/python3.5.3/bin/python3.5 test.py ; done

Investigation:

Strace shows following:

strace: Process 25273 attached
futex(0x2275550, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff

And I figured out the place where process hangs. It hangs in multiprocessing module, file process.py, line 269 (python3.5.3), on flushing STDERR:

...
267    util.info('process exiting with exitcode %d' % exitcode)
268    sys.stdout.flush()
269    sys.stderr.flush()
...

If line 269 commented the script completes successfully always.

My thoughts:

By default logging.StreamHandler uses sys.stderr as stream.

If process has been forked when logger flushing data to STDERR, process context gets some locked resource and further hangs on flushing STDERR.

Some workarounds which solves problem:

  • Use python2.7. I can’t reproduce it with python2.7. Maybe timings prevent me to reproduce the problem.
  • Use process to handle messages in logger instead of thread.

Do you have any ideas on this behavior? Where is the problem? Am I doing something wrong?

Asked By: Dmitry Moroz

||

Answers:

Question: Sometimes … Test process “proc” stay alive.

I could only reproduce your

TEST PROCESS:0 JOINED: is_alive=True

by adding a time.sleep(5) to def func():.
You use proc.join(timeout=3), that’s the expected behavior.

Conclusion:
Overloading your System, starts in my Environment with 30 Processes running, triggers your proc.join(timeout=3).
You may rethink your Testcase to reproduce your problem.

One Approach I think, is fine-tuning your Process/Thread with some time.sleep(0.05) to give off a timeslice.


  1. Your are using from multiprocessing import Queue
    use from queue import Queue instead.

    From the Documentation
    Class multiprocessing.Queue
    A queue class for use in a multi-processing (rather than multi-threading) context.

  2. In class QueueHandler(logging.Handler):, prevent to do

    self.queue.put_nowait(record)
    

    after

    class QueueListener(object):
    ...
    def stop(self):
        ...
    

    implement, for instance

    class QueueHandler(logging.Handler):
      def __init__(self):
          self.stop = Event()
          ...
    
  3. In def _monitor(self): use only ONE while ... loop.
    Wait until the self._thread stoped

    class QueueListener(object):
    ...
    def stop(self):
         self.handler.stop.set()
         while not self.queue.empty():
             time.sleep(0.5)
         # Don't use double flags
         #self._stop.set()
         self.queue.put_nowait(self._sentinel)
         self._thread.join()
    
Answered By: stovfl

It looks like this behaviour is related to this issue: http://bugs.python.org/issue6721

Answered By: dosER

The same can be seen with the following snippet:

#!/usr/bin/env python
import logging
import multiprocessing
import threading

print("start")

logging.basicConfig()
logger = logging.getLogger("main")

def thread_func():
    for i in range(100):
        logger.warning("log from thread %d", i)
    print("thread end")

def proc_func():
    pass
    print("proc end")

print("main func")

thr = threading.Thread(target=thread_func)
prc = multiprocessing.Process(target=proc_func)

thr.start()
prc.start()


thr.join()
print("join1")

prc.join()
print("join2")

print("main func end")

prc never joins. It also waits in flush on futex. Tested with cpythons: 3.9.2, 3.9.16, 3.10.10, 3.11.2. And I cannot reproduce it in cpython-2.7.18.

Answered By: Marek