Does reading an entire file leave the file handle open?

Question:

If you read an entire file with content = open('Path/to/file', 'r').read() is the file handle left open until the script exits? Is there a more concise method to read a whole file?

Asked By: tMC

||

Answers:

The answer to that question depends somewhat on the particular Python implementation.

To understand what this is all about, pay particular attention to the actual file object. In your code, that object is mentioned only once, in an expression, and becomes inaccessible immediately after the read() call returns.

This means that the file object is garbage. The only remaining question is “When will the garbage collector collect the file object?”.

in CPython, which uses a reference counter, this kind of garbage is noticed immediately, and so it will be collected immediately. This is not generally true of other python implementations.

A better solution, to make sure that the file is closed, is this pattern:

with open('Path/to/file', 'r') as content_file:
    content = content_file.read()

which will always close the file immediately after the block ends; even if an exception occurs.

Edit: To put a finer point on it:

Other than file.__exit__(), which is “automatically” called in a with context manager setting, the only other way that file.close() is automatically called (that is, other than explicitly calling it yourself,) is via file.__del__(). This leads us to the question of when does __del__() get called?

A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.

https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203

In particular:

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

[…]

CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.

https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types

(Emphasis mine)

but as it suggests, other implementations may have other behavior. As an example, PyPy has 6 different garbage collection implementations!

You can use pathlib.

For Python 3.5 and above:

from pathlib import Path
contents = Path(file_path).read_text()

For older versions of Python use pathlib2:

$ pip install pathlib2

Then:

from pathlib2 import Path
contents = Path(file_path).read_text()

This is the actual read_text implementation:

def read_text(self, encoding=None, errors=None):
    """
    Open the file in text mode, read it, and close the file.
    """
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
        return f.read()
Answered By: Eyal Levin

Well, if you have to read file line by line to work with each line, you can use

with open('Path/to/file', 'r') as f:
    s = f.readline()
    while s:
        # do whatever you want to
        s = f.readline()

Or even better way:

with open('Path/to/file') as f:
    for line in f:
        # do whatever you want to
Answered By: Kirill

Instead of retrieving the file content as a single string,
it can be handy to store the content as a list of all lines the file comprises:

with open('Path/to/file', 'r') as content_file:
    content_list = content_file.read().strip().split("n")

As can be seen, one needs to add the concatenated methods .strip().split("n") to the main answer in this thread.

Here, .strip() just removes whitespace and newline characters at the endings of the entire file string,
and .split("n") produces the actual list via splitting the entire file string at every newline character n.

Moreover,
this way the entire file content can be stored in a variable, which might be desired in some cases, instead of looping over the file line by line as pointed out in this previous answer.

Answered By: Andreas L.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.