Why does base64.b64encode() return a bytes object?
Question:
The purpose of base64.b64encode()
is to convert binary data into ASCII-safe “text”. However, the method returns an object of type bytes:
>>> import base64
>>> base64.b64encode(b'abc')
b'YWJj'
It’s easy to simply take that output and decode()
it, but my question is: what is a significance of base64.b64encode()
returning bytes
rather than a str
?
Answers:
It’s impossible for b64encode()
to know what you want to do with its output.
While in many cases you may want to treat the encoded value as text, in many others – for example, sending it over a network – you may instead want to treat it as bytes.
Since b64encode()
can’t know, it refuses to guess. And since the input is bytes
, the output remains the same type, rather than being implicitly coerced to str
.
As you point out, decoding the output to str
is straightforward:
base64.b64encode(b'abc').decode('ascii')
… as well as being explicit about the result.
As an aside, it’s worth noting that although base64.b64decode()
(note: decode, not encode) has accepted str
since version 3.3, the change was somewhat controversial.
The purpose of the base64.b64encode() function is to convert binary data into ASCII-safe “text”
Python disagrees with that – base64 has been intentionally classified as a binary transform.
It was a design decision in Python 3 to force the separation of bytes and text and prohibit implicit transformations. Python is now so strict about this that bytes.encode
doesn’t even exist, and so b'abc'.encode('base64')
would raise an AttributeError
.
The opinion the language takes is that a bytestring object is already encoded. A codec which encodes bytes into text does not fit into this paradigm, because when you want to go from the bytes domain to the text domain it’s a decode. Note that rot13
encoding was also banished from the list of standard encodings for the same reason – it didn’t fit properly into the Python 3 paradigm.
There also can be a performance argument to make: suppose Python automatically handled decoding of the base64 output, which is an ASCII-encoded binary representation produced by C code from the binascii
module, into a Python object in the text domain. If you actually wanted the bytes, you would just have to undo the decoding by encoding into ASCII again. It would be a wasteful round-trip, an unnecessary double-negation. Better to ‘opt-in’ for the decode-to-text step.
The purpose of base64.b64encode()
is to convert binary data into ASCII-safe “text”. However, the method returns an object of type bytes:
>>> import base64
>>> base64.b64encode(b'abc')
b'YWJj'
It’s easy to simply take that output and decode()
it, but my question is: what is a significance of base64.b64encode()
returning bytes
rather than a str
?
It’s impossible for b64encode()
to know what you want to do with its output.
While in many cases you may want to treat the encoded value as text, in many others – for example, sending it over a network – you may instead want to treat it as bytes.
Since b64encode()
can’t know, it refuses to guess. And since the input is bytes
, the output remains the same type, rather than being implicitly coerced to str
.
As you point out, decoding the output to str
is straightforward:
base64.b64encode(b'abc').decode('ascii')
… as well as being explicit about the result.
As an aside, it’s worth noting that although base64.b64decode()
(note: decode, not encode) has accepted str
since version 3.3, the change was somewhat controversial.
The purpose of the base64.b64encode() function is to convert binary data into ASCII-safe “text”
Python disagrees with that – base64 has been intentionally classified as a binary transform.
It was a design decision in Python 3 to force the separation of bytes and text and prohibit implicit transformations. Python is now so strict about this that bytes.encode
doesn’t even exist, and so b'abc'.encode('base64')
would raise an AttributeError
.
The opinion the language takes is that a bytestring object is already encoded. A codec which encodes bytes into text does not fit into this paradigm, because when you want to go from the bytes domain to the text domain it’s a decode. Note that rot13
encoding was also banished from the list of standard encodings for the same reason – it didn’t fit properly into the Python 3 paradigm.
There also can be a performance argument to make: suppose Python automatically handled decoding of the base64 output, which is an ASCII-encoded binary representation produced by C code from the binascii
module, into a Python object in the text domain. If you actually wanted the bytes, you would just have to undo the decoding by encoding into ASCII again. It would be a wasteful round-trip, an unnecessary double-negation. Better to ‘opt-in’ for the decode-to-text step.