Using multithreading with time.sleep and unique loggers

Question:

I’m trying to make sure that several threads start as close to each other as possible, and for that I’m using time.sleep. Each thread will have its own logger, which will output to its unique file.
There’s something very strange happening though…

  • Sometimes, not all logger files are created. In the example below, instead of 4 worker files, sometimes I’ll get 2, others 3. I don’t see a pattern.

Here’s a minimal working example:

#################        Logger           ########################
main_logger = logging.getLogger("main_logger")
main_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
    filename="./logs/print_multi.txt", mode="w"
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
    "%(asctime)s - %(threadName)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
main_logger.addHandler(file_handler)


def print_multi(start_time: datetime, index):
    # cleaning the worker directory
    for path in Path("./logs/workers_print/").glob("**/*"):
        if path.is_file():
            path.unlink()
    # creating logging
    worker_logger = logging.getLogger(f"print_worker_{index}")
    worker_logger.setLevel(logging.DEBUG)
    file_handler = logging.FileHandler(
        filename=f"./logs/workers_print/print_worker_{index}_ignore.txt",
        mode="w",
    )
    file_handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter(
        "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    )
    file_handler.setFormatter(formatter)
    worker_logger.addHandler(file_handler)

    # logging the times
    worker_logger.debug(f"This thread will start at {start_time}")
    time_now = datetime.now(tz=timezone.utc)
    seconds_to_start = (start_time - time_now).total_seconds()
    worker_logger.debug(f"seconds to start -> {seconds_to_start}")
    time.sleep(seconds_to_start)
    worker_logger.debug(f"We're in thread {index}")
    print(f"We're in thread {index}")

def main():
    main_logger.debug("Setting ThreadPoolExecuter")
    start_time = datetime.now(tz=timezone.utc) + timedelta(seconds=10)
    main_logger.debug(f"start_time -> {start_time}")
    workers = 4  # os.cpu_count()
    main_logger.debug(f"num_workers -> {workers}")
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = executor.map(
            print_multi, [start_time] * workers, range(workers)
        )
        for r in results:
            pass
    main_logger.debug("Finishn")
    


main()

Here’s an example of a traceback, when I got only worker 2 and 3 files, instead of 0,1,2,3:

Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
    return [fn(*args) for args in chunk]
  File "multithreading_MWE.py", line 72, in print_multi
    path.unlink()
  File "/usr/lib/python3.10/pathlib.py", line 1206, in unlink
    self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/workers_print/print_worker_1_ignore.txt'

Answers:

The issue is that the print_multi function is cleaning the worker directory before creating the worker logger and the log file. If another process runs print_multi before the logger has been created, it will not be able to create the log file and it will raise a FileNotFoundError.

To avoid this, you can move the cleaning of the worker directory to the main function before starting the executor. This way, the worker loggers will be created before the worker directory is deleted, ensuring that all log files are created successfully.

Answered By: Girolamo