How to do unit testing of functions writing files using Python's 'unittest'

Question:

I have a Python function that writes an output file to disk.

I want to write a unit test for it using Python’s unittest module.

How should I assert equality of files? I would like to get an error if the file content differs from the expected one + list of differences. As in the output of the Unix diff command.

Is there an official or recommended way of doing that?

Asked By: jan

||

Answers:

The simplest thing is to write the output file, then read its contents, read the contents of the gold (expected) file, and compare them with simple string equality. If they are the same, delete the output file. If they are different, raise an assertion.

This way, when the tests are done, every failed test will be represented with an output file, and you can use a third-party tool to diff them against the gold files (Beyond Compare is wonderful for this).

If you really want to provide your own diff output, remember that the Python stdlib has the difflib module. The new unittest support in Python 3.1 includes an assertMultiLineEqual method that uses it to show diffs, similar to this:

    def assertMultiLineEqual(self, first, second, msg=None):
        """Assert that two multi-line strings are equal.

        If they aren't, show a nice diff.

        """
        self.assertTrue(isinstance(first, str),
                'First argument is not a string')
        self.assertTrue(isinstance(second, str),
                'Second argument is not a string')

        if first != second:
            message = ''.join(difflib.ndiff(first.splitlines(True),
                                                second.splitlines(True)))
            if msg:
                message += " : " + msg
            self.fail("Multi-line strings are unequal:n" + message)
Answered By: Ned Batchelder

You could separate the content generation from the file handling. That way, you can test that the content is correct without having to mess around with temporary files and cleaning them up afterward.

If you write a generator method that yields each line of content, then you can have a file handling method that opens a file and calls file.writelines() with the sequence of lines. The two methods could even be on the same class: test code would call the generator, and production code would call the file handler.

Here’s an example that shows all three ways to test. Usually, you would just pick one, depending on what methods are available on the class to test.

import os
from io import StringIO
from unittest.case import TestCase


class Foo(object):
    def save_content(self, filename):
        with open(filename, 'w') as f:
            self.write_content(f)

    def write_content(self, f):
        f.writelines(self.generate_content())

    def generate_content(self):
        for i in range(3):
            yield u"line {}n".format(i)


class FooTest(TestCase):
    def test_generate(self):
        expected_lines = ['line 0n', 'line 1n', 'line 2n']
        foo = Foo()

        lines = list(foo.generate_content())

        self.assertEqual(expected_lines, lines)

    def test_write(self):
        expected_text = u"""
line 0
line 1
line 2
"""
        f = StringIO()
        foo = Foo()

        foo.write_content(f)

        self.assertEqual(expected_text, f.getvalue())

    def test_save(self):
        expected_text = u"""
line 0
line 1
line 2
"""
        foo = Foo()

        filename = 'foo_test.txt'
        try:
            foo.save_content(filename)

            with open(filename, 'rU') as f:
                text = f.read()
        finally:
            os.remove(filename)

        self.assertEqual(expected_text, text)
Answered By: Don Kirkby

I prefer to have output functions explicitly accept a file handle (or file-like object), rather than accept a file name and opening the file themselves. This way, I can pass a StringIO object to the output function in my unit test, then .read() the contents back from that StringIO object (after a .seek(0) call) and compare with my expected output.

For example, we would transition code like this

##File:lamb.py
import sys


def write_lamb(outfile_path):
    with open(outfile_path, 'w') as outfile:
        outfile.write("Mary had a little lamb.n")


if __name__ == '__main__':
    write_lamb(sys.argv[1])



##File test_lamb.py
import unittest
import tempfile

import lamb


class LambTests(unittest.TestCase):
    def test_lamb_output(self):
        outfile_path = tempfile.mkstemp()[1]
        try:
            lamb.write_lamb(outfile_path)
            contents = open(tempfile_path).read()
        finally:
            # NOTE: To retain the tempfile if the test fails, remove
            # the try-finally clauses
            os.remove(outfile_path)
        self.assertEqual(contents, "Mary had a little lamb.n")

to code like this

##File:lamb.py
import sys


def write_lamb(outfile):
    outfile.write("Mary had a little lamb.n")


if __name__ == '__main__':
    with open(sys.argv[1], 'w') as outfile:
        write_lamb(outfile)



##File test_lamb.py
import unittest
from io import StringIO

import lamb


class LambTests(unittest.TestCase):
    def test_lamb_output(self):
        outfile = StringIO()
        # NOTE: Alternatively, for Python 2.6+, you can use
        # tempfile.SpooledTemporaryFile, e.g.,
        #outfile = tempfile.SpooledTemporaryFile(10 ** 9)
        lamb.write_lamb(outfile)
        outfile.seek(0)
        content = outfile.read()
        self.assertEqual(content, "Mary had a little lamb.n")

This approach has the added benefit of making your output function more flexible if, for instance, you decide you don’t want to write to a file, but some other buffer, since it will accept all file-like objects.

Note that using StringIO assumes the contents of the test output can fit into main memory. For very large output, you can use a temporary file approach (e.g., tempfile.SpooledTemporaryFile).

Answered By: gotgenes

Based on suggestions I did the following.

class MyTestCase(unittest.TestCase):
    def assertFilesEqual(self, first, second, msg=None):
        first_f = open(first)
        first_str = first_f.read()
        second_f = open(second)
        second_str = second_f.read()
        first_f.close()
        second_f.close()

        if first_str != second_str:
            first_lines = first_str.splitlines(True)
            second_lines = second_str.splitlines(True)
            delta = difflib.unified_diff(first_lines, second_lines, fromfile=first, tofile=second)
            message = ''.join(delta)

            if msg:
                message += " : " + msg

            self.fail("Multi-line strings are unequal:n" + message)

I created a subclass MyTestCase as I have lots of functions that need to read/write files so I really need to have re-usable assert method. Now in my tests, I would subclass MyTestCase instead of unittest.TestCase.

What do you think about it?

Answered By: jan
import filecmp

Then

self.assertTrue(filecmp.cmp(path1, path2))
Answered By: tbc0

I always try to avoid writing files to disk, even if it’s a temporary folder dedicated to my tests: not actually touching the disk makes your tests much faster, especially if you interact with files a lot in your code.

Suppose you have this “amazing” piece of software in a file called main.py:

"""
main.py
"""

def write_to_file(text):
    with open("output.txt", "w") as h:
        h.write(text)

if __name__ == "__main__":
    write_to_file("Every great dream begins with a dreamer.")

To test the write_to_file method, you can write something like this in a file in the same folder called test_main.py:

"""
test_main.py
"""
from unittest.mock import patch, mock_open

import main


def test_do_stuff_with_file():
    open_mock = mock_open()
    with patch("main.open", open_mock, create=True):
        main.write_to_file("test-data")

    open_mock.assert_called_with("output.txt", "w")
    open_mock.return_value.write.assert_called_once_with("test-data")
Answered By: Enrico Marchesin

If you can use it, I’d strongly recommend using the following library: http://pyfakefs.org

pyfakefs creates an in-memory fake filesystem, and patches all filesystem accesses with accesses to the fake filesystem. This means you can write your code as you normally would, and in your unit tests, just make sure to initialize the fake filesystem in your setup.

from pyfakefs.fake_filesystem_unittest import TestCase

class ExampleTestCase(TestCase):
    def setUp(self):
        self.setUpPyfakefs()

    def test_create_file(self):
        file_path = '/test/file.txt'
        self.assertFalse(os.path.exists(file_path))
        self.fs.create_file(file_path)
        self.assertTrue(os.path.exists(file_path))

There is also a pytest plugin if you prefer to run your tests with pytest.

There are some caveats to this approach – you can’t use it if you’re accessing the filesystem via C libraries, because pyfakefs can’t use patch those calls. However, if you’re working in pure Python, I’ve found it to be a tremendously useful library.

Answered By: Herbert Lee
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.