does close() imply flush() in Python?

Question:

In Python, and in general – does a close() operation on a file object imply a flush() operation?

Asked By: Adam Matan

||

Answers:

Yes. It uses the underlying close() function which does that for you (source).

Answered By: Martin Wickman

NB: close() and flush() won’t ensure that the data is actually secure on the disk. It just ensures that the OS has the data == that it isn’t buffered inside the process.

You can try sync or fsync to get the data written to the disk.

Answered By: Douglas Leeder

filehandle.close does not necessarily flush. Surprisingly, filehandle.flush doesn’t help either—it still can get stuck in the OS buffers when Python is running. Observe this session where I wrote to a file, closed it and Ctrl-Z to the shell command prompt and examined the file:

$  cat xyz
ghi
$ fg
python

>>> x=open("xyz","a")
>>> x.write("morestuffn")
>>> x.write("morestuffn")
>>> x.write("morestuffn")
>>> x.flush
<built-in method flush of file object at 0x7f58e0044660>
>>> x.close
<built-in method close of file object at 0x7f58e0044660>
>>> 
[1]+  Stopped                 python
$ cat xyz
ghi

Subsequently I can reopen the file, and that necessarily syncs the file (because, in this case, I open it in the append mode). As the others have said, the sync syscall (available from the os package) should flush all buffers to disk but it has possible system-wide performance implications (it syncs all files on the system).

Answered By: przemek

Yes, in Python 3 this is finally in the official documentation, but is was already the case in Python 2 (see Martin’s answer).

Answered By: Felix D.

As a complement to this question, yes python flushes before close, however if you want to ensure data is written properly to disk this is not enough.

This is how I would write a file in a way that it’s atomically updated on a UNIX/Linux server, whenever the target file exists or not. Note that some filesystem will implicitly commit data to disk on close+rename (ext3 with data=ordered (default), and ext4 initially uncovered many application flaws before adding detection of write-close-rename patterns and sync data before metadata on those[1]).

# Write destfile, using a temporary name .<name>_XXXXXXXX
base, name = os.path.split(destfile)
tmpname = os.path.join(base, '.{}_'.format(name))  # This is the tmpfile prefix
with tempfile.NamedTemporaryFile('w', prefix=tmpname, delete=False) as fd:
    # Replace prefix with actual file path/name
    tmpname = str(fd.name)

    try:
        # Write fd here... ex:
        json.dumps({}, fd)

        # We want to fdatasync before closing, so we need to flush before close anyway
        fd.flush()
        os.fdatasync(fd)

        # Since we're using tmpfile, we need to also set the proper permissions
        if os.path.exists(destfile):
            # Copy destination file's mask
            os.fchmod(fd.fileno, os.stat(destfile).st_mode)
        else:
            # Set mask based on current umask value
            umask = os.umask(0o22)
            os.umask(umask)
            os.fchmod(fd.fileno, 0o666 & ~umask)  # 0o777 for dirs and executable files

        # Now we can close and rename the file (overwriting any existing one)
        fd.close()
        os.rename(tmpname, destfile)
    except:
        # On error, try to cleanup the temporary file
        try:
            os.unlink(tmpname)
        except OSError:
            pass
        raise

IMHO it would have been nice if Python provided simple methods around this… At the same time I guess if you care about data consistency it’s probably best to really understand what is going on at a low level, especially since there are many differences across various Operating Systems and Filesystems.

Also note that this does not guarantee the written data can be recovered, only that you will get a consistent copy of the data (old or new). To ensure the new data is safely written and accessible when returning, you need to use os.fsync(...) after the rename, and even then if you have unsafe caches in the write path you could still lose data. this is common on consumer-grade hardware although any system can be configured for unsafe writes which boosts performance too. At least even with unsafe caches, the method above should still guarantee whichever copy of the data you get is valid.

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.