How to print Unicode like “u{variable}” in Python 2.7?

Question:

For example, I can print Unicode symbol like:

print u'u00E0'

Or

a = u'u00E0'
print a

But it looks like I can’t do something like this:

a = 'u00E0'
print someFunctionToDisplayTheCharacterRepresentedByThisCodePoint(a)

The main use case will be in loops. I have a list of unicode code points and I wish to display them on console. Something like:

with open("someFileWithAListOfUnicodeCodePoints") as uniCodeFile:
    for codePoint in uniCodeFile:
        print codePoint #I want the console to display the unicode character here

The file has a list of unicode code points. For example:

2109
OOBO
00E4
1F1E6

The loop should output:

℉
°
ä
   

Any help will be appreciated!

Asked By: Zaid Tariq

||

Answers:

This is probably not a great way, but it’s a start:

>>> x = '00e4'
>>> print unicode(struct.pack("!I", int(x, 16)), 'utf_32_be')
ä

First, we get the integer represented by the hexadecimal string x. We pack that into a byte string, which we can then decode using the utf_32_be encoding.

Since you are doing this a lot, you can precompile the struct:

int2bytes = struct.Struct("!I").pack
with open("someFileWithAListOfUnicodeCodePoints") as fh:
    for code_point in fh:
        print unicode(int2bytes(int(code_point, 16)), 'utf_32_be')

If you think it’s clearer, you can also use the decode method instead of the unicode type directly:

>>> print int2bytes(int('00e4', 16)).decode('utf_32_be')
ä

Python 3 added a to_bytes method to the int class that lets you bypass the struct module:

>>> str(int('00e4', 16).to_bytes(4, 'big'), 'utf_32_be')
"ä"
Answered By: chepner

These are unicode code points but lack the u python unicode-escape. So, just put it in:

with open("someFileWithAListOfUnicodeCodePoints", "rb") as uniCodeFile:
    for codePoint in uniCodeFile:
        print "\u" + codePoint.strip()).decode("unicode-escape")

Whether this works on a given system depends on the console’s encoding. If its a Windows code page and the characters are not in its range, you’ll still get funky errors.

In python 3 that would be b"\u".

Answered By: tdelaney

You want print unichr(int('00E0',16)). Convert the hex string to an integer and print its Unicode codepoint.

Caveat: On Windows codepoints > U+FFFF won’t work.

Solution: Use Python 3.3+ print(chr(int(line,16)))

In all cases you’ll still need to use a font that supports the glyphs for the codepoints.

Answered By: Mark Tolonen
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.