Using multithreading with time.sleep and unique loggers
Question:
I’m trying to make sure that several threads start as close to each other as possible, and for that I’m using time.sleep
. Each thread will have its own logger, which will output to its unique file.
There’s something very strange happening though…
- Sometimes, not all logger files are created. In the example below, instead of 4 worker files, sometimes I’ll get 2, others 3. I don’t see a pattern.
Here’s a minimal working example:
################# Logger ########################
main_logger = logging.getLogger("main_logger")
main_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename="./logs/print_multi.txt", mode="w"
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(threadName)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
main_logger.addHandler(file_handler)
def print_multi(start_time: datetime, index):
# cleaning the worker directory
for path in Path("./logs/workers_print/").glob("**/*"):
if path.is_file():
path.unlink()
# creating logging
worker_logger = logging.getLogger(f"print_worker_{index}")
worker_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename=f"./logs/workers_print/print_worker_{index}_ignore.txt",
mode="w",
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
worker_logger.addHandler(file_handler)
# logging the times
worker_logger.debug(f"This thread will start at {start_time}")
time_now = datetime.now(tz=timezone.utc)
seconds_to_start = (start_time - time_now).total_seconds()
worker_logger.debug(f"seconds to start -> {seconds_to_start}")
time.sleep(seconds_to_start)
worker_logger.debug(f"We're in thread {index}")
print(f"We're in thread {index}")
def main():
main_logger.debug("Setting ThreadPoolExecuter")
start_time = datetime.now(tz=timezone.utc) + timedelta(seconds=10)
main_logger.debug(f"start_time -> {start_time}")
workers = 4 # os.cpu_count()
main_logger.debug(f"num_workers -> {workers}")
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(
print_multi, [start_time] * workers, range(workers)
)
for r in results:
pass
main_logger.debug("Finishn")
main()
Here’s an example of a traceback, when I got only worker 2 and 3 files, instead of 0,1,2,3:
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
return [fn(*args) for args in chunk]
File "multithreading_MWE.py", line 72, in print_multi
path.unlink()
File "/usr/lib/python3.10/pathlib.py", line 1206, in unlink
self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/workers_print/print_worker_1_ignore.txt'
Answers:
The issue is that the print_multi
function is cleaning the worker directory before creating the worker logger and the log file. If another process runs print_multi before the logger has been created, it will not be able to create the log file and it will raise a FileNotFoundError.
To avoid this, you can move the cleaning of the worker directory to the main function before starting the executor. This way, the worker loggers will be created before the worker directory is deleted, ensuring that all log files are created successfully.
I’m trying to make sure that several threads start as close to each other as possible, and for that I’m using time.sleep
. Each thread will have its own logger, which will output to its unique file.
There’s something very strange happening though…
- Sometimes, not all logger files are created. In the example below, instead of 4 worker files, sometimes I’ll get 2, others 3. I don’t see a pattern.
Here’s a minimal working example:
################# Logger ########################
main_logger = logging.getLogger("main_logger")
main_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename="./logs/print_multi.txt", mode="w"
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(threadName)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
main_logger.addHandler(file_handler)
def print_multi(start_time: datetime, index):
# cleaning the worker directory
for path in Path("./logs/workers_print/").glob("**/*"):
if path.is_file():
path.unlink()
# creating logging
worker_logger = logging.getLogger(f"print_worker_{index}")
worker_logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler(
filename=f"./logs/workers_print/print_worker_{index}_ignore.txt",
mode="w",
)
file_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
file_handler.setFormatter(formatter)
worker_logger.addHandler(file_handler)
# logging the times
worker_logger.debug(f"This thread will start at {start_time}")
time_now = datetime.now(tz=timezone.utc)
seconds_to_start = (start_time - time_now).total_seconds()
worker_logger.debug(f"seconds to start -> {seconds_to_start}")
time.sleep(seconds_to_start)
worker_logger.debug(f"We're in thread {index}")
print(f"We're in thread {index}")
def main():
main_logger.debug("Setting ThreadPoolExecuter")
start_time = datetime.now(tz=timezone.utc) + timedelta(seconds=10)
main_logger.debug(f"start_time -> {start_time}")
workers = 4 # os.cpu_count()
main_logger.debug(f"num_workers -> {workers}")
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(
print_multi, [start_time] * workers, range(workers)
)
for r in results:
pass
main_logger.debug("Finishn")
main()
Here’s an example of a traceback, when I got only worker 2 and 3 files, instead of 0,1,2,3:
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib/python3.10/concurrent/futures/process.py", line 205, in <listcomp>
return [fn(*args) for args in chunk]
File "multithreading_MWE.py", line 72, in print_multi
path.unlink()
File "/usr/lib/python3.10/pathlib.py", line 1206, in unlink
self._accessor.unlink(self)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/workers_print/print_worker_1_ignore.txt'
The issue is that the print_multi
function is cleaning the worker directory before creating the worker logger and the log file. If another process runs print_multi before the logger has been created, it will not be able to create the log file and it will raise a FileNotFoundError.
To avoid this, you can move the cleaning of the worker directory to the main function before starting the executor. This way, the worker loggers will be created before the worker directory is deleted, ensuring that all log files are created successfully.