Why does Python seem to behave differently between a call with pytest and a "manual" call?

Question:

Context

I want to test the behavior of a singleton class when used in multiprocessing environment because it has been brought to my attention that it does not work properly. It seems the same object is being used in two different processes.

Minimal example

  • Python 3.8.3
  • Windows 10 enterprise X64
  • pytest 6.1.2

singleton.py

from threading import Lock
class SingletonMeta(type):
    _instances = {}
    _lock = Lock()
    def __call__(cls, *args, **kwargs):
        with cls._lock:
            if cls not in cls._instances:
                cls._instances[cls] = super().__call__(*args, **kwargs)
        return cls._instances[cls]

manual_test.py

from singleton import SingletonMeta
from multiprocessing import Pool

class Test(metaclass=SingletonMeta):
    def __init__(self, value):
        self.value = value

def do(value):
    t = Test(value)
    return t.value, id(t)

if __name__ == "__main__":
    with Pool(2) as pool:
       values = [1, 2]
       results = pool.map(do, values)
       results.sort()
       print("results: ", results)

test_multiprocessing.py

from singleton import SingletonMeta
import pytest
from multiprocessing import Pool


class Singleton(metaclass=SingletonMeta):
    def __init__(self, value):
        self.value = value

def do(value):
    t = Singleton(value=value)
    return t.value, id(t)

def test_multiprocess():
    # given
    value1 = 1
    value2 = 2
    # when
    with Pool(2) as pool:
        results = pool.map(do, [value1, value2])
    
    results.sort()
    print("results", results)
    # then
    assert results[0][0] == value1
    assert results[1][0] == value2

manual_test.py output

$ python manual_test.py 
results:  [(1, 2076407268784), (1, 2076407268784)]

As you can see, the value and the object’s id are the same for both processes.

pytest output

note: truncated to remove noise

$ pytest -rA -v -p no:faulthandler
results [(1, 2356600174768), (2, 2732965816496)]

As you can see, both the values and the object’s id are different.

Problem

Given that these two programs have almost the very same code, I was expecting to have the same behavior for both:

  • The first value of the set should be se same
  • The id of the objects should be the same

However, this is only the case when calling manual_test.py, not with the pytest utility. My final goal is to have my class work in multiprocessing and test it in my library, hence I would like to know:

  • why is pytest behaving differently ? (or the other way arround, I am not sure which one is the "correct" behavior)
  • if the pytest behavior is "incorrect" (at least unexpected), how can I fix it ?
Asked By: Itération 122442

||

Answers:

The manual test passes due to a timing issue (or rather, due to lack of spent time/resources); while I’m not versed in how python decides to allocate pooled processes to given tasks, this can be demonstrated quite easily by checking the process id:

def do(value):
    t = Test(value)
    return t.value, id(t), os.getpid()

will result in

results:  [(1, 4311792512, 24184), (1, 4311792512, 24184)]

Which are the same, as you already observed. However;
adding some computation or time to this function would change the results:

def do(value):
    t = Test(value)
    time.sleep(0.1)
    return t.value, id(t), os.getpid()
->
results:  [(1, 4321196928, 24285), (2, 4338334592, 24286)]

or

def do(value):
    t = Test(value)
    for x in range(100000):
        pass
    return t.value, id(t), os.getpid()
->
results:  [(1, 4387257264, 24358), (2, 4338498480, 24359)]

pytest might be adding just enough overhead to cause the difference.


The main issue here is the difference between threads and processes; while threading.Lock works well for threads, a thread lock has no effect over a different process. On the other hand, while you can switch to using multiprocessing.Lock for the singleton, it can only prevent simultaneous creation of it, and does not guarantee one single class across all processes. This is because even after creation of a class, the cls._instances dictionary is not shared/synced across all processes.

There are various synchronization mechanisms that can handle such coordination, but the best fit depends on the specific use-case – what is to be achieved, and what should be prevented.

Answered By: micromoses
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.