Python file read() and readline() counter?

Question:

It looks like python keeps track of each run of read() and readline(). It is incremental, by each run, and in the end, it does not return any value. How to find this counter, and read a specific line at any time?

EDIT: My goal is to read a large file of a few Gb in size, hundreds of thousands of lines. If this an iterator then it is insufficient, I do not want to load the whole file in the memory. How do I jump to a specific line without having to read unnecessary lines?

A text file with just 3 lines.

# cat sample.txt
This is a sample text file. This is line 1
This is line 2
This is line 3

# python
Python 3.7.5 (default, Nov  7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('sample.txt', 'r')
>>> file.readline()
'This is a sample text file. This is line 1n'
>>> file.readline()
'This is line 2n'
>>> file.readline()
'This is line 3n'
>>> file.readline()
''
>>> file.readline()
''
>>> file.read()
''
>>> file.read(0)
''
>>> file.read()
''
>>>

# python
Python 3.7.5 (default, Nov  7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('sample.txt', 'r')
>>> file.read()
'This is a sample text file. This is line 1nThis is line 2nThis is line 3n'
>>> file.read()
''
>>> file.readline()
''
>>>
Asked By: Majoris

||

Answers:

A file object in Python is an iterator, iterating over the different lines in the file. You can use readlines() to read all the (remaining) lines at once into a list, or read() to read a single or all (remaining) characters in the file (default is all, use a parameter for the number of chars to read), but the default behaviour (if you iterate the file directly) is the same as with readline, i.e. yielding the next line from the file.

You can combine that with enumerate to get another iterator yielding the line number along with each line (the first line having number 0 unless you specify enumerate‘s start parameter), or to get a specific line:

>>> f = open("test.txt")
>>> lines = enumerate(f)
>>> next(lines)
(0, 'first linen')
>>> next(lines)
(1, 'second linen')
>>> next(lines)
(2, 'third linen')

>>> f = open("test.txt")
>>> lines = enumerate(f)
>>> next(l for i, l in lines if i == 3)
'fourth linen'

There’s also the seek method, which can be used to jump to a specific character in the file, which is useful for “resetting” the file to the first position (alternatively to re-opening it), but does not help much in finding a specific line unless you know the exact length of each line. (see below)

If you want to “read any line in any order” the simplest way is to actually read all the lines into a list using readlines and then accessing items in that list (provided that your file is not too large).

>>> f = open("test.txt")
>>> lines = f.readlines()
>>> lines[3]
'fourth linen'
>>> lines[1]
'second linen'

My goal is to read a large file of a few Gb in size, hundreds of thousands of lines.

Since the only way for Python to know where a line ends, and thus where a particular line starts, is to count the number of n characters it encounters, there’s no way around reading the entire file. If the file is very large, and you have to repeatedly read lines out of order, it might make sense to read the file once one line at a time, storing the starting positions of each line in a dictionary. Afterwards, you can use seek to quickly jump to and then read a particular line.

f = open("test.txt")
total = 1
lines = {}
for i, line in enumerate(f):
    lines[i] = total - 1
    total += len(line)
# jump to and read individual lines
f.seek(lines[3])
print(f.readline())
f.seek(lines[0])
print(f.readline())
Answered By: tobias_k

The file object (i.e. from open(file)) behaves as an iterator when readline() is used. There is no counter, per se. This can be observed if you run file.__next__() in place of file.readline().

The simple solution if you don’t mind reading the whole file at once is just to create a list of all the lines and then reference the ones you’re interested in, as

lines=file.readlines() # this is a list
Answered By: jpf
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.