pytest, xdist, and sharing generated file dependencies
Question:
I have multiple tests that need require an expensive-to-generate file.
I’d like the file to be re-generated on every test run, but no more than once.
To complicate the matter, both these tests as well as the file depend on an input parameter.
def expensive(param) -> Path:
# Generate file and return its path.
@mark.parametrize('input', TEST_DATA)
class TestClass:
def test_one(self, input) -> None:
check_expensive1(expensive(input))
def test_two(self, input) -> None:
check_expensive2(expensive(input))
How can make sure that this file is not regenerated across threads even when running these tests in parallel?
For context, I’m porting test infrastructure that Makefiles to pytest.
I’d be OK, with using file-based locks to synchronize, but I’m sure someone else has had this problem and would rather use an existing solution.
Using functools.cache
works great for a single thread. Fixtures with scope="module"
doesn’t work at all, because the parameter input
is at function scope.
Answers:
There’s an existing solution in the pytest-xdist documentation section "Making session-scoped fixtures execute only once":
import json
import pytest
from filelock import FileLock
@pytest.fixture(scope="session")
def session_data(tmp_path_factory, worker_id):
if worker_id == "master":
# not executing in with multiple workers, just produce the data and let
# pytest's fixture caching do its job
return produce_expensive_data()
# get the temp directory shared by all workers
root_tmp_dir = tmp_path_factory.getbasetemp().parent
fn = root_tmp_dir / "data.json"
with FileLock(str(fn) + ".lock"):
if fn.is_file():
data = json.loads(fn.read_text())
else:
data = produce_expensive_data()
fn.write_text(json.dumps(data))
return data
Note that filelock is not part of the standard library, but is available from PyPI. You can find the documentation here.
I have multiple tests that need require an expensive-to-generate file.
I’d like the file to be re-generated on every test run, but no more than once.
To complicate the matter, both these tests as well as the file depend on an input parameter.
def expensive(param) -> Path:
# Generate file and return its path.
@mark.parametrize('input', TEST_DATA)
class TestClass:
def test_one(self, input) -> None:
check_expensive1(expensive(input))
def test_two(self, input) -> None:
check_expensive2(expensive(input))
How can make sure that this file is not regenerated across threads even when running these tests in parallel?
For context, I’m porting test infrastructure that Makefiles to pytest.
I’d be OK, with using file-based locks to synchronize, but I’m sure someone else has had this problem and would rather use an existing solution.
Using functools.cache
works great for a single thread. Fixtures with scope="module"
doesn’t work at all, because the parameter input
is at function scope.
There’s an existing solution in the pytest-xdist documentation section "Making session-scoped fixtures execute only once":
import json
import pytest
from filelock import FileLock
@pytest.fixture(scope="session")
def session_data(tmp_path_factory, worker_id):
if worker_id == "master":
# not executing in with multiple workers, just produce the data and let
# pytest's fixture caching do its job
return produce_expensive_data()
# get the temp directory shared by all workers
root_tmp_dir = tmp_path_factory.getbasetemp().parent
fn = root_tmp_dir / "data.json"
with FileLock(str(fn) + ".lock"):
if fn.is_file():
data = json.loads(fn.read_text())
else:
data = produce_expensive_data()
fn.write_text(json.dumps(data))
return data
Note that filelock is not part of the standard library, but is available from PyPI. You can find the documentation here.