Python unit test that uses an external data file

Question:

I have a Python project that I’m working on in Eclipse and I have the following file structure:

/Project
    /projectname
        module1.py
        module2.py 
        # etc.
    /test
        testModule1.py
        # etc.
        testdata.csv

In one of my tests I create an instance of one of my classes giving 'testdata.csv' as a parameter. This object does open('testdata.csv') and reads the contents.

If I run just this single test file with unittest everything works and the file is found and read properly. However if I try to run all my unit tests (i.e. run by right clicking the test directory rather than the individual test file), I get an error that file could not be found.

Is there any way to get around this (other than providing an absolute path, which I’d prefer not to do)?

Asked By: user123959

||

Answers:

Your tests should not open the file directly, every test should copy the file and work with its copy.

Answered By: Rei Fly

Usually what I do is define

THIS_DIR = os.path.dirname(os.path.abspath(__file__))

at the top of each test module. Then it doesn’t matter what working directory you’re in – the file path is always the same relative to the where the test module sits.

Then I use something like this is in my test (or test setup):

my_data_path = os.path.join(THIS_DIR, os.pardir, 'data_folder/data.csv')

Or in your case, since the data source is in the test directory:

my_data_path = os.path.join(THIS_DIR, 'testdata.csv')

Edit: for modern python

from pathlib import Path

THIS_DIR = Path(__file__).parent

my_data_path = THIS_DIR.parent / 'data_folder/data.csv'

# or if it's in the same directory
my_data_path = THIS_DIR / 'testdata.csv'
Answered By: Jamie Bull

Unit test that access the file system are generally not a good idea. This is because the test should be self contained, by making your test data external to the test it’s no longer immediately obvious which test the csv file belongs to or even if it’s still in use.

A preferable solution is to patch open and make it return a file-like object.

from unittest import TestCase
from unittest.mock import patch, mock_open

from textwrap import dedent

class OpenTest(TestCase):
    DATA = dedent("""
        a,b,c
        x,y,z
        """).strip()

    @patch("builtins.open", mock_open(read_data=DATA))
    def test_open(self):

        # Due to how the patching is done, any module accessing `open' for the 
        # duration of this test get access to a mock instead (not just the test 
        # module).
        with open("filename", "r") as f:
            result = f.read()

        open.assert_called_once_with("filename", "r")
        self.assertEqual(self.DATA, result)
        self.assertEqual("a,b,cnx,y,z", result)
Answered By: Dunes

In my opinion the best way to handle these cases is to program via inversion of control.

In the two sections below I primarily show how a no-inversion-of-control solution would look like. The second section shows a solution with inversion of control and how this code can be tested without a mocking-framework.

In the end I state some personal pros and cons that do not at all have the intend to be correct and or complete. Feel free to comment for augmentation and correction.

No inversion of control (no dependency injection)

You have a class that uses the std open method from python.

class UsesOpen(object):
  def some_method(self, path):
    with open(path) as f:
      process(f)

# how the class is being used in the open
def main():
  uses_open = UsesOpen()
  uses_open.some_method('/my/path')

Here I have used open explicitly in my code, so the only way to write tests for it would be to use explicit test-data (files) or use a mocking-framework like Dunes suggests.
But there is still another way:

My suggestion: Inversion of control (with dependency injection)

Now I rewrote the class differently:

class UsesOpen(object):
  def __init__(self, myopen):
    self.__open = myopen

  def some_method(self, path):
    with self.__open(path) as f:
      process(f)

# how the class is being used in the open
def main():
  uses_open = UsesOpen(open)
  uses_open.some_method('/my/path')

In this second example I injected the dependency for open into the constructor (Constructor Dependency Injection).

Writing tests for inversion of control

Now I can easily write tests and use my test version of open when I need it:

EXAMPLE_CONTENT = """my file content
as an example
this can be anything"""

TEST_FILES = {
  '/my/long/fake/path/to/a/file.conf': EXAMPLE_CONTENT
}

class MockFile(object):
  def __init__(self, content):
    self.__content = content
  def read(self):
    return self.__content

  def __enter__(self):
    return self
  def __exit__(self, type, value, tb):
    pass

class MockFileOpener(object):
  def __init__(self, test_files):
    self.__test_files = test_files

  def open(self, path, *args, **kwargs):
    return MockFile(self.__test_files[path])

class TestUsesOpen(object):
  def test_some_method(self):
    test_opener = MockFileOpener(TEST_FILES)

    uses_open = UsesOpen(test_opener.open)

    # assert that uses_open.some_method('/my/long/fake/path/to/a/file.conf')
    # does the right thing

Pro/Con

Pro Dependency Injection

  • no need to learn mocking framework for tests
  • complete control over the classes and methods that have to be faked
  • also changing and evolving your code is easier in general
  • code quality normally improves, as one of the most important
    factors is being able to respond to changes as easy as possible
  • using dependency injection and a dependency injection framework
    is generally a respected way to work on a project https://en.wikipedia.org/wiki/Dependency_injection

Con Dependency Injection

  • a little bit more code to write in general
  • in tests not as short as patching a class via @patch
  • constructors can get overloaded with dependencies
  • you need to somehow learn to use dependency-injection
Answered By: Jan

For test discovery it is recommended to make your test folder a package. In this case you can access resources in the test folder using importlib.resources (mind Python version compatibility of the individual functions, there are backports available as importlib_resources), as described here, e.g. like:

import importlib.resources

test_file_path_str = str(importlib.resources.files('tests').joinpath('testdata.csv'))
test_function_expecting_filename(test_file_path_str)

Like this you do not need to rely on inferring file locations of your code.

Answered By: user1587520
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.