Why does base64 encode return bytes instead of string directly?

Question:

import base64
base64.b64encode(b'bytes required')

>>>b'Ynl0ZXMgcmVxdWlyZWQ='

If I understand correctly, base64 is a bytes <—-> string notation. Then why doesn’t it give me string 'Ynl0ZXMgcmVxdWlyZWQ=' directly?

Or does it expect me to do some decoding furthermore?

Like b'Ynl0ZXMgcmVxdWlyZWQ='.decode('ascii') or
b'Ynl0ZXMgcmVxdWlyZWQ='.decode('utf-8') ? But they result in the same thing.

Asked By: Rick

||

Answers:

You are correct that base64 is meant to be a textual representation of binary data.

However, you are neglecting constraints on the actual implementation side of things.

import sys

>>> sys.getsizeof("Hello World")
60
>>> sys.getsizeof("Hello World".encode("utf-8"))
44

str objects simply take up more system resources than bytes. This overhead can lead to non-trivial degradation in performance when working with larger bodies of base64 encoded data.

I also suspect that since the original python module was ported from python2.7 (which did not distinguish between str and bytes), that this might also just be a legacy part of the language.

Answered By: AlanSTACK
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.