io.BufferedReader peek function returning all the text in the buffer

Question:

I am using Python 3.4.1 on Windows 8.

I would like to read a file with a buffered interface that allows me to peek a certain number of bytes ahead as well reading bytes. io.BufferedReader seems like the right choice.

Unfortunately, io.BufferReader.peek seems useless because it appears to just return all the bytes stored in the buffer, rather than the number requested. In fact, this is allowed by the documentation of this function (emphasis mine):

peek([size]) Return bytes from the stream without advancing the
position. At most one single read on the raw stream is done to satisfy
the call. The number of bytes returned may be less or more than
requested.

To demonstrate what I consider useless behaviour, I have the following test file called Test1.txt:

first line
second line
third line

I create the io.BufferedReader object like this in IDLE:

>>> stream = io.BufferedReader(io.FileIO('Test1.txt'))

and then ask for two bytes,

>>> stream.peek(2)
b'first linernsecond linernthird line'

Eh? That’s just all the text in the default buffer size (which is 8192 bytes on my system). If I change this default, I can confirm that peek() is just returning the contents of the buffer,

>>> stream2 = io.BufferedReader(io.FileIO('Test1.txt'), buffer_size=2)
>>> stream2.peek(17)
b'fi'
>>> stream2.peek(17)
b'fi'
>>> stream2.read(2)
b'fi'
>>> stream2.peek(17)
b'rs'

To be clear, the following is the output I expect to see:

>>> stream = io.BufferedReader(io.FileIO('Test1.txt'))
>>> stream.peek(2)
b'fi'
>>> stream.read(1)
b'f'
>>> stream.peek(2)
b'ir'

That is, a typical buffered stream.

What am I doing wrong in constructing this BufferedReader? How can I observe the behaviour I expect to see in Python 3.4.1?

Asked By: Charles

||

Answers:

.peek() is indeed implemented as returning the current buffer; if you combined it with .read() calls you’d find that less and less of the buffer is returned until the buffer is filled up again.

For most purposes of .peek() this is more than fine. The number of bytes lets you limit how much data is expected from the underlying I/O source if the buffer is empty, which in turn is important if that source blocks on reads.

Simply slice the returned value:

stream.peek(num)[:num]
Answered By: Martijn Pieters
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.