Python: How to convert string of an encoded Unicode variable to a binary variable

Question:

I am building an app to transliterate Myanmar (Burmese) text to International Phonetic Alphabet. I have found that it’s easier to manipulate combining Unicode characters in my dictionaries as binary variables like this

b'xe1x80xadxe1x80xafxe1x80x84': 'áɪŋ̃',    #'ိုင'

because otherwise they glue to neighboring characters like apostrophes and I can’t get them unstuck.
I am translating UTF-8 Burmese characters to binary hex variables to build my dictionaries. Sometimes I want to convert backwards. I wrote a simple program:

while True:
    binary = input("Enter binary Unicode string or 'exit' to exit: ")
    if binary == 'exit':
        break
    else:
        print(binary.decode('utf-8'))

Obviously this will not work for an input such as b'xe1x80xad', since input() returns a string.
I would like to know the quickest way to convert the string "b'xe1x80xad'" to it’s Unicode form ‘ ိ ‘. My only idea is

binary = bin(int(binary, 16))

but that returns

ValueError: invalid literal for int() with base 16: "b'\xe1\x80\xad\'"

Please help! Thanks.

Asked By: Deersfeet

||

Answers:

The issue is that the input string "b’xe1x80xad’" is not a valid hexadecimal string that can be converted to Unicode directly. However, you can use the ast module to safely evaluate the string as a Python literal and get the corresponding bytes object. Here’s an example:

import ast

binary = "b'\xe1\x80\xad'"
bytes_obj = ast.literal_eval(binary)
unicode_str = bytes_obj.decode('utf-8')
print(unicode_str)

This should output the Unicode character ‘ိ’. The ast.literal_eval() function safely evaluates the input string as a Python literal, which in this case is a bytes object. You can then decode the bytes object to get the corresponding Unicode string.

Answered By: blackscratch22
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.