Python converting from base64 to binary

Question:

I have a problem about converting a base64 encoded string into binary. I am collecting the Fingerprint2D in the following link,

url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/108770/property/Fingerprint2D/xml"

Fingerprint2D=AAADccB6OAAAAAAAAAAAAAAAAAAAAAAAAAA8WIEAAAAAAACxAAAAHgAACAAADAzBmAQwzoMABgCI AiTSSACCCAAhIAAAiAEMTMgMJibMsZuGeijn4BnI+YeQ0OMOKAACAgAKAABQAAQEABQAAAAAAAAA AA==

The descriptiong in the Pubchem says that this is 115 byte string, and it should be 920 bits when converted into binary. I try to convert it to the binary with the following,

    response = requests.get(url)
    tree = ET.fromstring(response.text)

    for el in tree[0]:
        if "Fingerprint2D" in el.tag:
            fpp = bin(int(el.text, 16))
            print(len(fpp))

If I use the code above, I’m getting the following error, “Value error: invalid literal for int() with base16:

And if I use the code below, length of fpp (binary) is equal to 1278 which is not what I expected.

    response = requests.get(url)
    tree = ET.fromstring(response.text)

    for el in tree[0]:
        if "Fingerprint2D" in el.tag:
            fpp = bin(int(hexlify(el.text), 16))
            print(len(fpp))

Thanks a lot already!!

Asked By: patti_jane

||

Answers:

To decode base64 format you need to pass a bytes object to the base64.decodebytes function:

import base64

t = "AAADccB6OAAAAAAAAAAAAAAAAAAAAAAAAAA8WIEAAAAAAACxAAAAHgAACAAADAzBmAQwzoMABgCI AiTSSACCCAAhIAAAiAEMTMgMJibMsZuGeijn4BnI+YeQ0OMOKAACAgAKAABQAAQEABQAAAAAAAAA AA==".encode("ascii")

decoded = base64.decodebytes(t)

print(decoded)
print(len(decoded)*8)

I get the following:

b'x00x00x03qxc0z8x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00<Xx81x00x00x00x00x00x00xb1x00x00x00x1ex00x00x08x00x00x0cx0cxc1x98x040xcex83x00x06x00x88x02$xd2Hx00x82x08x00! x00x00x88x01x0cLxc8x0c&&xccxb1x9bx86z(xe7xe0x19xc8xf9x87x90xd0xe3x0e(x00x02x02x00nx00x00Px00x04x04x00x14x00x00x00x00x00x00x00x00'
920

So 920 bits as expected.

To get data as binary just iterate on the bytes and convert to binary using format and zero-padding to 8 digits (bin adds a 0b header so it’s not suitable), and join the strings together:

print("".join(["{:08b}".format(x) for x in decoded]))

results in:

00000000000000000000001101110001110000000111101000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011110001011000100000010000000000000000000000000000000000000000000000001011000100000000000000000000000000011110000000000000000000001000000000000000000000001100000011001100000110011000000001000011000011001110100000110000000000000110000000001000100000000010001001001101001001001000000000001000001000001000000000000010000100100000000000000000000010001000000000010000110001001100110010000000110000100110001001101100110010110001100110111000011001111010001010001110011111100000000110011100100011111001100001111001000011010000111000110000111000101000000000000000001000000010000000000000101000000000000000000101000000000000000001000000010000000000000101000000000000000000000000000000000000000000000000000000000000000000

(which is 920 chars, as expected)

The easiest way to perform this action using Python 3 is this:

import base64    
base64.b64decode(base64_to_binary_input) 
Answered By: karthik r