How can I base64 encode using a custom letter set?

Question:

I am trying to base64 encode using a custom character set in python3. Most of the examples I have seen in SO are related to Python 2, so I had to make some minor adjustments to the code. The issue that I am facing is that I am replacing the character / with _, but it is still printing with /. My code is: This is just an example, i am not trying to only base64 with urlsafe chars. custom could be anything with the correct length.

import base64

data = 'some random? data'
print(base64.b64encode(data.encode()))

std_base64chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
custom = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"

data = data.translate(str.maketrans(custom, std_base64chars)).encode()

print(base64.b64encode(data))

# Both prints
b'c29tZSByYW5kb20/IGRhdGE='
b'c29tZSByYW5kb20/IGRhdGE='

How can I get the translation to work so that occurrences of / are replaced correctly with _?

Edit

I should make it clear that I am not trying to do only one type of base64 encoding here like urlsafe, but any possible character set. This will be a function were a user can pass their own charset. I am looking for a character by character mapping, not string slicing.

Edit

Because there is some confusion around the clarity of my question, I am try to add more details.

I am trying to write a function that can take an arbitrary charset from a user, and then map them individually before base64 encoding. Most of the answers have been around manipulating altchars or string slice and replace, but that doesnt solve all the needs.

So for example, the itoa64 charset is:
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz= or unix crypt format is ./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz. The answers although correct, does not address these situations.

Asked By: securisec

||

Answers:

If the only characters you want to switch are + and , you can use base64.urlsafe_b64encode() to replace with - and _ respectively.

>>> base64.urlsafe_b64encode(data.encode())
b'c29tZSByYW5kb20_IGRhdGE='

Alternatively, you can replace those characters with characters of your own choice using the optional argument of base64.b64encode():

>>> base64.b64encode(data.encode(), '*&'.encode())
b'c29tZSByYW5kb20&IGRhdGE='

If you need to use an entirely new alphabet, you can do

import base64

data = 'some random? data'
print(base64.b64encode(data.encode()))

std_base64chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
custom = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_"


x = base64.b64encode(data.encode())
print(bytes(str(x)[2:-1].translate(str(x)[2:-1].maketrans(std_base64chars, custom)), 'utf-8'))

Which outputs:

b'c29tZSByYW5kb20/IGRhdGE='
b'C29TzsbYyw5KB20_igrHDge='
Answered By: CDJB

Shouldn’t this work:

import base64


data = 'some random? data'

custom = b"-_"

rslt = base64.b64encode(data)
print(rslt)

rslt = base64.b64encode(data, altchars=custom)
print(rslt)

I get following output:

c29tZSByYW5kb20/IGRhdGE=
c29tZSByYW5kb20_IGRhdGE=

or if you insist, that custom contains:

custom = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"

then use:

rslt = base64.b64encode(data, altchars=custom[-2:])
Answered By: gelonida