How to print Unicode like “u{variable}” in Python 2.7?

Question

For example, I can print Unicode symbol like:

print u'u00E0'

Or

a = u'u00E0'
print a

But it looks like I can’t do something like this:

a = 'u00E0'
print someFunctionToDisplayTheCharacterRepresentedByThisCodePoint(a)

The main use case will be in loops. I have a list of unicode code points and I wish to display them on console. Something like:

with open("someFileWithAListOfUnicodeCodePoints") as uniCodeFile:
    for codePoint in uniCodeFile:
        print codePoint #I want the console to display the unicode character here

The file has a list of unicode code points. For example:

2109
OOBO
00E4
1F1E6

The loop should output:

℉
°
ä

Any help will be appreciated!

Asked By: Zaid Tariq

||

Source

Answer 1

This is probably not a great way, but it’s a start:

>>> x = '00e4'
>>> print unicode(struct.pack("!I", int(x, 16)), 'utf_32_be')
ä

First, we get the integer represented by the hexadecimal string x. We pack that into a byte string, which we can then decode using the utf_32_be encoding.

Since you are doing this a lot, you can precompile the struct:

int2bytes = struct.Struct("!I").pack
with open("someFileWithAListOfUnicodeCodePoints") as fh:
    for code_point in fh:
        print unicode(int2bytes(int(code_point, 16)), 'utf_32_be')

If you think it’s clearer, you can also use the decode method instead of the unicode type directly:

>>> print int2bytes(int('00e4', 16)).decode('utf_32_be')
ä

Python 3 added a to_bytes method to the int class that lets you bypass the struct module:

>>> str(int('00e4', 16).to_bytes(4, 'big'), 'utf_32_be')
"ä"

Answered By: chepner

Answer 2

These are unicode code points but lack the u python unicode-escape. So, just put it in:

with open("someFileWithAListOfUnicodeCodePoints", "rb") as uniCodeFile:
    for codePoint in uniCodeFile:
        print "\u" + codePoint.strip()).decode("unicode-escape")

Whether this works on a given system depends on the console’s encoding. If its a Windows code page and the characters are not in its range, you’ll still get funky errors.

In python 3 that would be b"\u".

Answered By: tdelaney

Answer 3

You want print unichr(int('00E0',16)). Convert the hex string to an integer and print its Unicode codepoint.

Caveat: On Windows codepoints > U+FFFF won’t work.

Solution: Use Python 3.3+ print(chr(int(line,16)))

In all cases you’ll still need to use a font that supports the glyphs for the codepoints.

Answered By: Mark Tolonen

How to print Unicode like “u{variable}” in Python 2.7?

Question:

Answers: