How to print unicode character from a string variable?

Question:

I am new in programming world, and I am a bit confused.

I expecting that both print result the same graphical unicode exclamation mark symbol:

My experiment:

number   = 10071
byteStr  = number.to_bytes(4, byteorder='big')
hexStr   = hex(number)
uniChar  = byteStr.decode('utf-32be')
uniStr   = '\u' + hexStr[2:6]
print(f'{number} - {hexStr[2:6]} - {byteStr} - {uniChar}')

print(f'{uniStr}')   # Not working
print(f'u2757')     # Working

Output:

10071 - 2757 - b"x00x00'W" - ❗
u2757
❗

What are the difference in the last two lines?
Please, help me to understand it!

My environment is JupyterHub and v3.9 python.

Asked By: Tapper

||

Answers:

An escape code evaluated by the Python parser when constructing literal strings. For example, the literal string '马' and 'u9a6c' are evaluated by the parser as the same, length 1, string.

You can (and did) build a string with the 6 characters u9a6c by using an escape code for the backslash (\) to prevent the parser from evaluating those 6 characters as an escape code, which is why it prints as the 6-character u2757.

If you build a byte string with those 6 characters, you can decode it with .decode('unicode-escape') to get the character:

>>> b'\u2757'.decode('unicode_escape')
'❗'

But it is easier to use the chr() function on the number itself:

>>> chr(0x2757)
'❗'
>>> chr(10071)
'❗'
Answered By: Mark Tolonen
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.