Converting Unicode to Binary

Question

I am trying to convert Arabic text to utf8 encoded bytes then to binary by using this answer here.

First, I used the code as it is in the example:

'{:b}'.format(int(u'سلام'.encode('utf-8').encode('hex'), 16))

But I got this error:

AttributeError: 'bytes' object has no attribute 'encode'

Also I removed .encode('hex') but still gives the same error.

Is there any way to convert utf8 codes to binary and vise versa?

Asked By: Nujud

||

Source

Answer 1

How about this:

>>> ''.join('{:08b}'.format(b) for b in 'سلام'.encode('utf8'))
'1101100010110011110110011000010011011000101001111101100110000101'

This iterates over the encoded bytes object, where you get an integer in the range 0..255 for each iteration.
Then the integer is formatted in binary notation with zero padding up to 8 digits.
Then glue everything together with str.join().

For the inverse, the approach given in an answer from the question you linked to can be adapted to Python 3 as follows (s is the output of the above example, ie. a str of 0s and 1s):

>>> import re
>>> bytes(int(b, 2) for b in re.split('(........)', s) if b).decode('utf8')
'سلام'

Answered By: lenz

Converting Unicode to Binary

Question:

Answers: