Python Multiprocessing, Pytest, and Windows: How to perform pytest live-logging when using multiprocessing in a cross-platform setting?

Question:

I created a Github repo with all the code necessary to test the issue. For the sake of consistency and ease of search, I pasted most of the repo’s README.md below.

NOTE: This question requires Linux or a system that can (preferably a combination of Windows+Linux) support "spawn" and "fork" multiprocessing contexts.

Setup Environment

Install the required packages through the following:

pip install logger_tt loguru pytest

My testing environment is the following:

  • Python 3.9
  • Ubuntu 22.04 and Windows 11

Requirements

I am trying to make live logging with the pytest framework work with multiprocessing. The solution requires to function in Windows, MacOS and Linux operating systems. When developing multiprocessing, it is important considering the start process method, these are the available methods for each OS according to this reference:

  • Windows: spawn
  • MacOS: spawn, fork, forkserver (only spawn is stable, therby default)
  • Linux: spawn, fork, forkserver (default spawn)

Avoid (if possible) using queues, servers, and custom pytest hooks.

Logs should go to the stdout or be visible in the terminal.

Resources

Below is a collection of

The Test

First, I tried to see if three logging libraries (built-in logging, logger_tt, and loguru) worked with just multiprocessing, as done in mp_logging.py. In that file, there are 2 parameters START_METHOD and LOGGING_METHOD.

After getting the 3 libraries to work with both fork and spawn process starting methods, I tried to replicate the test within the pytest framework. I was not able to capture the children logging with any library when using spawn. Given that Windows can only use spawn, this is not a solution.

Expected Output

Built-in Logging (both spawn and fork)

(test) eduardo@avocado-XPS-13-9300:~/GitHub/SOQuestion$ python mp_logging.py
2022-10-30 23:07:15,019 [INFO] test: Parent logging
2022-10-30 23:07:15,084 [INFO] test: Children Logging
2022-10-30 23:07:15,086 [INFO] test: Children Logging
2022-10-30 23:07:15,087 [INFO] test: Children Logging
2022-10-30 23:07:15,095 [INFO] test: Parent logging: end

logger_tt (spawn)

(test) eduardo@avocado-XPS-13-9300:~/GitHub/SOQuestion$ python mp_logging.py
[2022-10-30 23:08:36] INFO: Parent logging
[2022-10-30 23:08:36] INFO: SpawnProcess-1 Children Logging
[2022-10-30 23:08:36] INFO: SpawnProcess-2 Children Logging
[2022-10-30 23:08:36] INFO: SpawnProcess-3 Children Logging
[2022-10-30 23:08:36] INFO: Parent logging: end

logger_tt (fork)

(test) eduardo@avocado-XPS-13-9300:~/GitHub/SOQuestion$ python mp_logging.py
[2022-10-30 23:09:42] INFO: Parent logging
[2022-10-30 23:09:42] INFO: SpawnProcess-1 Children Logging
[2022-10-30 23:09:42] INFO: SpawnProcess-2 Children Logging
[2022-10-30 23:09:42] INFO: SpawnProcess-3 Children Logging
[2022-10-30 23:09:42] INFO: Parent logging: end

loguru (both spawn and fork)

(test) eduardo@avocado-XPS-13-9300:~/GitHub/SOQuestion$ python mp_logging.py
2022-10-30 23:11:07.608 | INFO     | __main__:<module>:92 - Parent logging
2022-10-30 23:11:07.712 | INFO     | __mp_main__:target_function:83 - Children Logging
2022-10-30 23:11:07.713 | INFO     | __mp_main__:target_function:83 - Children Logging
2022-10-30 23:11:07.715 | INFO     | __mp_main__:target_function:83 - Children Logging
2022-10-30 23:11:07.731 | INFO     | __main__:<module>:103 - Parent logging: end

Actual Output

logging and logger_tt on fork: Correct output

(test) eduardo@avocado-XPS-13-9300:~/GitHub/SOQuestion$ pytest
======================================== test session starts ========================================
platform linux -- Python 3.8.13, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/eduardo/GitHub/SOQuestion, configfile: pyproject.toml
collected 1 item                                                                                    

test/test_logging.py::test_logging_with_multiprocessing 
------------------------------------------- live log call -------------------------------------------
Parent logging
Children Logging
Children Logging
Children Logging
Parent logging: end
PASSED                                                                                        [100%]

========================================= 1 passed in 0.02s =========================================

logging and logger_tt on spawn: Missing children output

(test) eduardo@avocado-XPS-13-9300:~/GitHub/SOQuestion$ pytest
======================================== test session starts ========================================
platform linux -- Python 3.8.13, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/eduardo/GitHub/SOQuestion, configfile: pyproject.toml
collected 1 item                                                                                    

test/test_logging.py::test_logging_with_multiprocessing 
------------------------------------------- live log call -------------------------------------------
Parent logging
Parent logging: end
PASSED                                                                                        [100%]

========================================= 1 passed in 0.26s =========================================

loguru for both spawn and fork: no output

(test) eduardo@avocado-XPS-13-9300:~/GitHub/SOQuestion$ pytest
======================================== test session starts ========================================
platform linux -- Python 3.8.13, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/eduardo/GitHub/SOQuestion, configfile: pyproject.toml
collected 1 item                                                                                    

test/test_logging.py::test_logging_with_multiprocessing PASSED                                [100%]

========================================= 1 passed in 0.02s =========================================

Any help is greatly appreciated!

Asked By: Eduardo Davalos

||

Answers:

After experimenting, I determined that indeed pytestwas the culprit responsible for not showing the child processes’ logs in sys.stdout. With this, I found a GitHub issue from PyTest and used a proposed solution to achieve my requirements.

I extended the solution and added more tests to verify that the solution works for both fork and spawn multiprocessing contexts (testing if it works on UNIX and Windows systems). I have pushed the solution to the aforementioned GitHub repo (commit). Here is the solution and the obtained output:

import logging
import logging.handlers
import logging.config
import socket
from collections import deque
import pickle
import threading
import multiprocessing as mp

import pytest
from pytest_lazyfixture import lazy_fixture


LOGGING_CONFIG = {
    "version": 1,
    "disable_existing_loggers": True,
    "formatters": {
        "standard": {
            "format": "%(asctime)s [%(levelname)s] %(name)s: %(message)s",
            "datefmt": "%Y-%m-%d %H:%M:%S",
        },
    },
    "handlers": {
        "console": {
            "level": "DEBUG",
            "formatter": "standard",
            "class": "logging.StreamHandler",
            "stream": "ext://sys.stdout",  # Default is stderr
        },
        "datagram": {
            "level": "DEBUG",
            "formatter": "standard",
            "class": "logging.handlers.DatagramHandler",
            "host": "127.0.0.1",
            "port": 5555
        }
    },
    "loggers": {
        "": {
            "handlers": ["console"],
            "level": "DEBUG",
            "propagate": True,
        },
        "subprocess": {
            "handlers": ["datagram"],
            "level": "DEBUG",
            "propagate": True
        }
    },
}

# Setup the logging configuration
logging.config.dictConfig(LOGGING_CONFIG)

@pytest.fixture
def logreceiver():

    def listener():

        # Create server and logger to relay messages
        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        s.bind(("127.0.0.1", 5555))
        logger = logging.getLogger("")

        # Continue listening until signaled to stop
        while True:

            # Listen for information
            data = s.recv(4096)
            if data == b"die":
                break

            # Dont forget to skip over the 32-bit length prepended
            logrec = pickle.loads(data[4:])
            rec = logging.LogRecord(
                name=logrec['name'],
                level=logrec['levelno'],
                pathname=logrec['pathname'],
                lineno=logrec['lineno'],
                msg=logrec['msg'],
                args=logrec['args'],
                exc_info=logrec['exc_info'],
                func=logrec['funcName'],

            )
            logger.handle(rec)

        # Safely shutdown socket
        s.close()

    # Start relaying thread
    receiver_thread = threading.Thread(target=listener)
    receiver_thread.start()

    yield

    # Shutting down thread
    t = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    t.sendto(b"die", ("127.0.0.1", 5555))
    receiver_thread.join()

def target_function():

    # Depending on the type of process, get the logger
    if 'Fork' in mp.current_process().name:
        logger = logging.getLogger("")
    elif "Spawn" in mp.current_process().name:
        logger = logging.getLogger("subprocess")
    else:
        raise RuntimeError("Invalid multiprocessing spawn method.")

    # Test logging
    logger.info("Children Logging")


@pytest.mark.parametrize(
    "_logreceiver, ctx",
    [
        (lazy_fixture("logreceiver"), mp.get_context("spawn")),
        (lazy_fixture("logreceiver"), mp.get_context("fork")),

    ]
)
def test_something(_logreceiver, ctx):

    # Try different methods
    logger = logging.getLogger("")
    logger.info("Parent logging")

    ps = []
    for i in range(3):
        p = ctx.Process(target=target_function)
        p.start()
        ps.append(p)

    for p in ps:
        p.join()

    logger.info("Parent logging: end")

The approach uses a Thread listening to a UDP socket in the main process. Then the children processes send their logs to the UDP port. The Thread later relays the logs to the root logger. By avoiding using QueueHandler and other queue approaches, you don’t have to manage queues in your subprocesses.

With this solution, I get the following test output:

======================================== test session starts ========================================
platform linux -- Python 3.8.13, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/eduardo/GitHub/SOQuestion, configfile: pyproject.toml
plugins: lazy-fixture-0.6.3
collected 2 items                                                                                   

test/test_logging_working.py::test_something[logreceiver-ctx0] 
------------------------------------------- live log call -------------------------------------------
2022-11-06 10:48:51.748 [    INFO] Parent logging (test_logging_working.py:128)
2022-11-06 10:48:51.994 [    INFO] Children Logging (test_logging_working.py:113)
2022-11-06 10:48:52.005 [    INFO] Children Logging (test_logging_working.py:113)
2022-11-06 10:48:52.031 [    INFO] Children Logging (test_logging_working.py:113)
2022-11-06 10:48:52.079 [    INFO] Parent logging: end (test_logging_working.py:139)
PASSED                                                                                        [ 50%]
test/test_logging_working.py::test_something[logreceiver-ctx1] 
------------------------------------------- live log call -------------------------------------------
2022-11-06 10:48:52.084 [    INFO] Parent logging (test_logging_working.py:128)
2022-11-06 10:48:52.086 [    INFO] Children Logging (test_logging_working.py:113)
2022-11-06 10:48:52.088 [    INFO] Children Logging (test_logging_working.py:113)
2022-11-06 10:48:52.089 [    INFO] Children Logging (test_logging_working.py:113)
2022-11-06 10:48:52.091 [    INFO] Parent logging: end (test_logging_working.py:139)
PASSED                                                                                        [100%]

========================================= 2 passed in 0.37s =========================================

[Process exited 0]

Answered By: Eduardo Davalos