Convert Unicode char code to char on Python

Question:

I have a list of Unicode character codes I need to convert into chars on python 2.7.

U+0021
U+0022
U+0023
.......
U+0024

How to do that?

Asked By: Arno

||

Answers:

a = 'U+aaa'
a.encode('ascii','ignore')
'aaa'

This will convert for unicode to Ascii which i think is what you want.

Answered By: Sherpa

This regular expression will replace all U+nnnn sequences with the corresponding Unicode character:

import re

s = u'''
U+0021
U+0022
U+0023
.......
U+0024
'''

s = re.sub(ur'U+([0-9A-F]{4})',lambda m: unichr(int(m.group(1),16)),s)

print(s)

Output:

!
"
#
.......
$

Explanation:

  • unichr gives the character of a codepoint, e.g. unichr(0x21) == u'!'.
  • int('0021',16) converts a hexadecimal string to an integer.
  • lambda(m): expression is an anonymous function that receives the regex match.
    It defines a function equivalent to def func(m): return expression but inline.
  • re.sub matches a pattern and sends each match to a function that returns the replacement. In this case, the pattern is U+hhhh where h is a hexadecimal digit, and the replacement function converts the hexadecimal digit string into a Unicode character.
Answered By: Mark Tolonen

The code written below will take every Unicode string and will convert into the string.

for I in list:
    print(I.encode('ascii', 'ignore'))
Answered By: Gautam Kumar

In case anyone using Python 3 and above wonders, how to do this effectively, I’ll leave this post here for reference, since I didn’t realize the author was asking about Python 2.7…

Just use the built-in python function chr():

char = chr(0x2474)
print(char)

Output:

Remember that the four digits in Unicode codenames U+WXYZ stand for a hexadecimal number WXYZ, which in python should be written as 0xWXYZ.

Answered By: aurel1510
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.