Converting Unicode to Binary
Question:
I am trying to convert Arabic text to utf8
encoded bytes then to binary by using this answer here.
First, I used the code as it is in the example:
'{:b}'.format(int(u'سلام'.encode('utf-8').encode('hex'), 16))
But I got this error:
AttributeError: 'bytes' object has no attribute 'encode'
Also I removed .encode('hex')
but still gives the same error.
Is there any way to convert utf8
codes to binary and vise versa?
Answers:
How about this:
>>> ''.join('{:08b}'.format(b) for b in 'سلام'.encode('utf8'))
'1101100010110011110110011000010011011000101001111101100110000101'
This iterates over the encoded bytes
object, where you get an integer in the range 0..255 for each iteration.
Then the integer is formatted in binary notation with zero padding up to 8 digits.
Then glue everything together with str.join()
.
For the inverse, the approach given in an answer from the question you linked to can be adapted to Python 3 as follows (s
is the output of the above example, ie. a str
of 0s and 1s):
>>> import re
>>> bytes(int(b, 2) for b in re.split('(........)', s) if b).decode('utf8')
'سلام'
I am trying to convert Arabic text to utf8
encoded bytes then to binary by using this answer here.
First, I used the code as it is in the example:
'{:b}'.format(int(u'سلام'.encode('utf-8').encode('hex'), 16))
But I got this error:
AttributeError: 'bytes' object has no attribute 'encode'
Also I removed .encode('hex')
but still gives the same error.
Is there any way to convert utf8
codes to binary and vise versa?
How about this:
>>> ''.join('{:08b}'.format(b) for b in 'سلام'.encode('utf8'))
'1101100010110011110110011000010011011000101001111101100110000101'
This iterates over the encoded bytes
object, where you get an integer in the range 0..255 for each iteration.
Then the integer is formatted in binary notation with zero padding up to 8 digits.
Then glue everything together with str.join()
.
For the inverse, the approach given in an answer from the question you linked to can be adapted to Python 3 as follows (s
is the output of the above example, ie. a str
of 0s and 1s):
>>> import re
>>> bytes(int(b, 2) for b in re.split('(........)', s) if b).decode('utf8')
'سلام'