the output in python base64.b64decode doesn't match java's decode Base64

Question:

I’m trying to refactor some scala code to python3. Currently stuck at decoding a string in base64. The output from Python’s base64.b64decode does not match the Scala’s output.

Scala:


import org.apache.commons.codec.binary.Base64.decodeBase64


val coded_str = "UgKgDwhoEAAANAEA1tYAADABABoBABMAAAAAAQAAAAEAAQACAAAAAAD6sT4AO0YAAA=="
decodeBase64(coded_str)

//Output 1 :
res1: Array[Byte] = Array(82, 2, -96, 15, 8, 104, 16, 0, 0, 52, 1, 0, -42, -42, 0, 0, 48, 1, 0, 26, 1, 0, 19, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 2, 0, 0, 0, 0, 0, -6, -79, 62, 0, 59, 70, 0, 0)



coded_str.getBytes()

//Output 2
res2: Array[Byte] = Array(85, 103, 75, 103, 68, 119, 104, 111, 69, 65, 65, 65, 78, 65, 69, 65, 49, 116, 89, 65, 65, 68, 65, 66, 65, 66, 111, 66, 65, 66, 77, 65, 65, 65, 65, 65, 65, 81, 65, 65, 65, 65, 69, 65, 65, 81, 65, 67, 65, 65, 65, 65, 65, 65, 68, 54, 115, 84, 52, 65, 79, 48, 89, 65, 65, 65, 61, 61)

In Python, I tried:

import base64
coded_str = 'UgKgDwhoEAAANAEA1tYAADABABoBABMAAAAAAQAAAAEAAQACAAAAAAD6sT4AO0YAAA=='


print (base64.b64decode(coded_str))

#Output 1 :

b'Rx02xa0x0fx08hx10x00x004x01x00xd6xd6x00x000x01x00x1ax01x00x13x00x00x00x00x01x00x00x00x01x00x01x00x02x00x00x00x00x00xfaxb1>x00;Fx00x00'

#Command 2:


b = [ord(s) for s in coded_str]
print (b)

#Output 2
[85, 103, 75, 103, 68, 119, 104, 111, 69, 65, 65, 65, 78, 65, 69, 65, 49, 116, 89, 65, 65, 68, 65, 66, 65, 66, 111, 66, 65, 66, 77, 65, 65, 65, 65, 65, 65, 81, 65, 65, 65, 65, 69, 65, 65, 81, 65, 67, 65, 65, 65, 65, 65, 65, 68, 54, 115, 84, 52, 65, 79, 48, 89, 65, 65, 65, 61, 61]

Trying to get the Output 1 from python to match Scala’s.

Output 2 matches, but idk how to convert it from here.

Any help would be appreciated. Thanks!

Trying to get the same result in Python that I see in Scala.

Array(82, 2, -96, 15, 8, 104, 16, 0, 0, 52, 1, 0, -42, -42, 0, 0, 48, 1, 0, 26, 1, 0, 19, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 2, 0, 0, 0, 0, 0, -6, -79, 62, 0, 59, 70, 0, 0)

Asked By: tde

||

Answers:

you get the same output … its just bytes

import base64
coded_str = 'UgKgDwhoEAAANAEA1tYAADABABoBABMAAAAAAQAAAAEAAQACAAAAAAD6sT4AO0YAAA=='


decoded_str = base64.b64decode(coded_str)

# you can get unsigned bytes by just using ord

bytes_ord = [ord(x) for x in decoded_str]

# but in java those look like signed bytes which take a tiny bit more effort...
import struct
bytes_match = struct.unpack(f"{len(decoded_str)}b",decoded_str)
print(bytes_match)
Answered By: Joran Beasley

No, it is the same.

This:

b’Rx02xa0x0fx08hx10x00x004x01x00xd6xd6x00x000x01x00x1ax01x00x13x00x00x00x00x01x00x00x00x01x00x01x00x02x00x00x00x00x00xfaxb1>x00;Fx00x00′

and this:

res1: Array[Byte] = Array(82, 2, -96, 15, 8, 104, 16, 0, 0, 52, 1, 0, -42, -42, 0, 0, 48, 1, 0, 26, 1, 0, 19, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 2, 0, 0, 0, 0, 0, -6, -79, 62, 0, 59, 70, 0, 0)

Are in fact the exact same sequence.

82 is the ascii code for capital R. Hence, 82 in the scala side and the R (first char in your python binary string) are both indicating: "A byte, whose value is 82".

Second byte is x02 pythonside, and 2 scalaside. Same thing – character with unicode 2 is not printable so python makes that x02. It’s the same byte.

And so on. -96 is the same as xa0 = xa0 is stating it in terms of unsigned hexadecimal, and -96 is stating the exact same bit sequence but printing it as two’s complement signed binary. Undoing 2’s complement (negate the bits, and add 1): 96 = 0110 0000. flip all bits then add 1: 1001 1111, add 1: 1010 0000. Which is 128+32 = 160, put that in hex terms: 160 goes into 16 exactly ‘a’ (10) times, so, xa0.

That 70 at the end there is an ‘F’ in the python string because 70 is unicode for capital F, etc.

In general, don’t attempt to print raw bytes like this, as it’s just confusing.

Answered By: rzwitserloot
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.