How the write(), read() and getvalue() methods of Python io.BytesIO work?

Question:

I’m trying to understand the write() and read() methods of io.BytesIO.
My understanding was that I could use the io.BytesIO as I would use a File
object.

import io
in_memory = io.BytesIO(b'hello')
print( in_memory.read() )

The above code will return b’hello’ as expected, but the code below will return an empty string b”.

import io
in_memory = io.BytesIO(b'hello')
in_memory.write(b' world')
print( in_memory.read() )

My questions are:

-What is io.BytesIO.write(b' world') doing exactly?

-What is the difference between io.BytesIO.read() and io.BytesIO.getvalue()?

I assume that the answer is related to io.BytesIO being a stream object, but the big picture is not clear to me.

Asked By: Robert

||

Answers:

this is a memory stream but still a stream. The position is stored, so like any other stream if you try to read after having written, you have to re-position:

import io
in_memory = io.BytesIO(b'hello')
in_memory.seek(0,2)   # seek to end, else we overwrite
in_memory.write(b' world')
in_memory.seek(0)    # seek to start
print( in_memory.read() )

prints:

b'hello world'

while in_memory.getvalue() doesn’t need the final seek(0) as it returns the contents of the stream from position 0.

The issue is that you are positioned at the end of the stream. Think of the position like a cursor. Once you have written b' world', your cursor is at the end of the stream. When you try to .read(), you are reading everything after the position of the cursor – which is nothing, so you get the empty bytestring.

To navigate around the stream you can use the .seek method:

>>> import io
>>> in_memory = io.BytesIO(b'hello', )
>>> in_memory.write(b' world')
>>> in_memory.seek(0)  # go to the start of the stream
>>> print(in_memory.read())
b' world'

Note that, just like a filestream in write ('w') mode, the initial bytes b'hello' have been overwritten by your writing of b' world'.

.getvalue() just returns the entire contents of the stream regardless of current position.

Answered By: johnpaton

BytesIO does behave like a file, only one that you can both read and write. The confusing part, maybe, is that the reading and writing position is the same one. So first you do:

in_memory = io.BytesIO(b'hello')

This gives you a bytes buffer in in_memory with the contents b'hello' and with the read/write position at the beginning (before the first b'h'). When you do:

in_memory.write(b' world')

You are effectively overwriting b'hello' with b' world' (and actually getting one byte further), and now you have the position at the end (after the last b'd'). So when you do:

print( in_memory.read() )

You see nothing because there is nothing to read after the current position. You can, however, use seek to move the position, so if you do

import io
in_memory = io.BytesIO(b'hello')
in_memory.write(b' world')
in_memory.seek(0)
print( in_memory.read() )

You get:

b' world'

Note that you do not see the initial b'hello' because it was overwritten. If you want to write after the initial content, you can first seek to the end:

import io
in_memory = io.BytesIO(b'hello')
in_memory.seek(0, 2)
in_memory.write(b' world')
in_memory.seek(0)
print( in_memory.read() )

Output:

b'hello world'

EDIT: About getvalue, as pointed out by other answers, it gives you the full internal buffer, independently of the current position. This operation is obviously not available for files.

Answered By: jdehesa
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.