Ambiguous output with python and files

Question:

I was testing how filestreams work in python and I wrote some code as follows:

with open('test2.txt') as f:
    while r := f.read(1):
        print(repr(r), f.tell(), sep='tindex:', end='n***************n')

Contents of test2.txt are as follows:

012345

6789

I ran the code and the output is as follows:

'0'     index:1
***************
'1'     index:2
***************
'2'     index:3
***************
'3'     index:4
***************
'4'     index:5
***************
'5'     index:18446744073709551623
***************
'n'    index:8
***************
'n'    index:10
***************
'6'     index:11
***************
'7'     index:12
***************
'8'     index:13
***************
'9'     index:14
***************

Someone please help me understand as to why f.tell() returns 18446744073709551623 and also why ‘n’ has index 8 instead of 7 if we assume ‘5’ to get index 6. Thank you in advance.

Asked By: Shrehan Raj Singh

||

Answers:

The Python documentation mentions that the file.tell() method returns an undefined value when called after file.read() or file.readline() has been called. This is specifically mentioned under the section detailing the file.tell() method.

In your code snippet, the call to f.tell() occurs immediately after the method f.read(1). Thus, the return value is undefined. To get expected results, avoid calling f.tell() right after f.read() or f.readlines().

The second point regarding the newline character having an index of 8 instead of 7 is because newline character (n) is considered a single character in Python, and thus it occupies one place in the file. It’s essentially an invisible character that signifies a line break. So after 5, the newline character ‘n’ is at index 6. Then f.tell() points to the start of the next character, which is another newline character in your file, thus the index 8.

This is consistent with the operation of file streams where the index is the position where the next write would happen, which is right after the last read character. The seemingly "skipped" index of 7 is due to the newline character ‘n’.

Regarding the unexpected large index of 18446744073709551623, it’s probably due to the issue I mentioned earlier – using tell() after read(). It might be a bug or a system specific issue. It’d be best to not rely on tell() right after a read().

See also

Python file.tell() giving strange numbers?

Python file.tell gives wrong value location

Answered By: undefined
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.