Why does Python seem to behave differently between a call with pytest and a "manual" call?
Question:
Context
I want to test the behavior of a singleton class when used in multiprocessing environment because it has been brought to my attention that it does not work properly. It seems the same object is being used in two different processes.
Minimal example
- Python 3.8.3
- Windows 10 enterprise X64
- pytest 6.1.2
singleton.py
from threading import Lock
class SingletonMeta(type):
_instances = {}
_lock = Lock()
def __call__(cls, *args, **kwargs):
with cls._lock:
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
manual_test.py
from singleton import SingletonMeta
from multiprocessing import Pool
class Test(metaclass=SingletonMeta):
def __init__(self, value):
self.value = value
def do(value):
t = Test(value)
return t.value, id(t)
if __name__ == "__main__":
with Pool(2) as pool:
values = [1, 2]
results = pool.map(do, values)
results.sort()
print("results: ", results)
test_multiprocessing.py
from singleton import SingletonMeta
import pytest
from multiprocessing import Pool
class Singleton(metaclass=SingletonMeta):
def __init__(self, value):
self.value = value
def do(value):
t = Singleton(value=value)
return t.value, id(t)
def test_multiprocess():
# given
value1 = 1
value2 = 2
# when
with Pool(2) as pool:
results = pool.map(do, [value1, value2])
results.sort()
print("results", results)
# then
assert results[0][0] == value1
assert results[1][0] == value2
manual_test.py output
$ python manual_test.py
results: [(1, 2076407268784), (1, 2076407268784)]
As you can see, the value
and the object’s id are the same for both processes.
pytest output
note: truncated to remove noise
$ pytest -rA -v -p no:faulthandler
results [(1, 2356600174768), (2, 2732965816496)]
As you can see, both the values and the object’s id are different.
Problem
Given that these two programs have almost the very same code, I was expecting to have the same behavior for both:
- The first value of the set should be se same
- The id of the objects should be the same
However, this is only the case when calling manual_test.py, not with the pytest utility. My final goal is to have my class work in multiprocessing and test it in my library, hence I would like to know:
- why is pytest behaving differently ? (or the other way arround, I am not sure which one is the "correct" behavior)
- if the pytest behavior is "incorrect" (at least unexpected), how can I fix it ?
Answers:
The manual test passes due to a timing issue (or rather, due to lack of spent time/resources); while I’m not versed in how python decides to allocate pooled processes to given tasks, this can be demonstrated quite easily by checking the process id:
def do(value):
t = Test(value)
return t.value, id(t), os.getpid()
will result in
results: [(1, 4311792512, 24184), (1, 4311792512, 24184)]
Which are the same, as you already observed. However;
adding some computation or time to this function would change the results:
def do(value):
t = Test(value)
time.sleep(0.1)
return t.value, id(t), os.getpid()
->
results: [(1, 4321196928, 24285), (2, 4338334592, 24286)]
or
def do(value):
t = Test(value)
for x in range(100000):
pass
return t.value, id(t), os.getpid()
->
results: [(1, 4387257264, 24358), (2, 4338498480, 24359)]
pytest might be adding just enough overhead to cause the difference.
The main issue here is the difference between threads and processes; while threading.Lock
works well for threads, a thread lock has no effect over a different process. On the other hand, while you can switch to using multiprocessing.Lock
for the singleton, it can only prevent simultaneous creation of it, and does not guarantee one single class across all processes. This is because even after creation of a class, the cls._instances
dictionary is not shared/synced across all processes.
There are various synchronization mechanisms that can handle such coordination, but the best fit depends on the specific use-case – what is to be achieved, and what should be prevented.
Context
I want to test the behavior of a singleton class when used in multiprocessing environment because it has been brought to my attention that it does not work properly. It seems the same object is being used in two different processes.
Minimal example
- Python 3.8.3
- Windows 10 enterprise X64
- pytest 6.1.2
singleton.py
from threading import Lock
class SingletonMeta(type):
_instances = {}
_lock = Lock()
def __call__(cls, *args, **kwargs):
with cls._lock:
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
manual_test.py
from singleton import SingletonMeta
from multiprocessing import Pool
class Test(metaclass=SingletonMeta):
def __init__(self, value):
self.value = value
def do(value):
t = Test(value)
return t.value, id(t)
if __name__ == "__main__":
with Pool(2) as pool:
values = [1, 2]
results = pool.map(do, values)
results.sort()
print("results: ", results)
test_multiprocessing.py
from singleton import SingletonMeta
import pytest
from multiprocessing import Pool
class Singleton(metaclass=SingletonMeta):
def __init__(self, value):
self.value = value
def do(value):
t = Singleton(value=value)
return t.value, id(t)
def test_multiprocess():
# given
value1 = 1
value2 = 2
# when
with Pool(2) as pool:
results = pool.map(do, [value1, value2])
results.sort()
print("results", results)
# then
assert results[0][0] == value1
assert results[1][0] == value2
manual_test.py output
$ python manual_test.py
results: [(1, 2076407268784), (1, 2076407268784)]
As you can see, the value
and the object’s id are the same for both processes.
pytest output
note: truncated to remove noise
$ pytest -rA -v -p no:faulthandler
results [(1, 2356600174768), (2, 2732965816496)]
As you can see, both the values and the object’s id are different.
Problem
Given that these two programs have almost the very same code, I was expecting to have the same behavior for both:
- The first value of the set should be se same
- The id of the objects should be the same
However, this is only the case when calling manual_test.py, not with the pytest utility. My final goal is to have my class work in multiprocessing and test it in my library, hence I would like to know:
- why is pytest behaving differently ? (or the other way arround, I am not sure which one is the "correct" behavior)
- if the pytest behavior is "incorrect" (at least unexpected), how can I fix it ?
The manual test passes due to a timing issue (or rather, due to lack of spent time/resources); while I’m not versed in how python decides to allocate pooled processes to given tasks, this can be demonstrated quite easily by checking the process id:
def do(value):
t = Test(value)
return t.value, id(t), os.getpid()
will result in
results: [(1, 4311792512, 24184), (1, 4311792512, 24184)]
Which are the same, as you already observed. However;
adding some computation or time to this function would change the results:
def do(value):
t = Test(value)
time.sleep(0.1)
return t.value, id(t), os.getpid()
->
results: [(1, 4321196928, 24285), (2, 4338334592, 24286)]
or
def do(value):
t = Test(value)
for x in range(100000):
pass
return t.value, id(t), os.getpid()
->
results: [(1, 4387257264, 24358), (2, 4338498480, 24359)]
pytest might be adding just enough overhead to cause the difference.
The main issue here is the difference between threads and processes; while threading.Lock
works well for threads, a thread lock has no effect over a different process. On the other hand, while you can switch to using multiprocessing.Lock
for the singleton, it can only prevent simultaneous creation of it, and does not guarantee one single class across all processes. This is because even after creation of a class, the cls._instances
dictionary is not shared/synced across all processes.
There are various synchronization mechanisms that can handle such coordination, but the best fit depends on the specific use-case – what is to be achieved, and what should be prevented.