Convert Unicode char code to char on Python
Question:
I have a list of Unicode character codes I need to convert into chars on python 2.7.
U+0021
U+0022
U+0023
.......
U+0024
How to do that?
Answers:
a = 'U+aaa'
a.encode('ascii','ignore')
'aaa'
This will convert for unicode to Ascii which i think is what you want.
This regular expression will replace all U+nnnn
sequences with the corresponding Unicode character:
import re
s = u'''
U+0021
U+0022
U+0023
.......
U+0024
'''
s = re.sub(ur'U+([0-9A-F]{4})',lambda m: unichr(int(m.group(1),16)),s)
print(s)
Output:
!
"
#
.......
$
Explanation:
unichr
gives the character of a codepoint, e.g. unichr(0x21) == u'!'
.
int('0021',16)
converts a hexadecimal string to an integer.
lambda(m): expression
is an anonymous function that receives the regex match.
It defines a function equivalent to def func(m): return expression
but inline.
re.sub
matches a pattern and sends each match to a function that returns the replacement. In this case, the pattern is U+hhhh
where h is a hexadecimal digit, and the replacement function converts the hexadecimal digit string into a Unicode character.
The code written below will take every Unicode string and will convert into the string.
for I in list:
print(I.encode('ascii', 'ignore'))
In case anyone using Python 3 and above wonders, how to do this effectively, I’ll leave this post here for reference, since I didn’t realize the author was asking about Python 2.7…
Just use the built-in python function chr()
:
char = chr(0x2474)
print(char)
Output:
⑴
Remember that the four digits in Unicode codenames U+WXYZ
stand for a hexadecimal number WXYZ
, which in python should be written as 0xWXYZ
.
I have a list of Unicode character codes I need to convert into chars on python 2.7.
U+0021
U+0022
U+0023
.......
U+0024
How to do that?
a = 'U+aaa'
a.encode('ascii','ignore')
'aaa'
This will convert for unicode to Ascii which i think is what you want.
This regular expression will replace all U+nnnn
sequences with the corresponding Unicode character:
import re
s = u'''
U+0021
U+0022
U+0023
.......
U+0024
'''
s = re.sub(ur'U+([0-9A-F]{4})',lambda m: unichr(int(m.group(1),16)),s)
print(s)
Output:
!
"
#
.......
$
Explanation:
unichr
gives the character of a codepoint, e.g.unichr(0x21) == u'!'
.int('0021',16)
converts a hexadecimal string to an integer.lambda(m): expression
is an anonymous function that receives the regex match.
It defines a function equivalent todef func(m): return expression
but inline.re.sub
matches a pattern and sends each match to a function that returns the replacement. In this case, the pattern isU+hhhh
where h is a hexadecimal digit, and the replacement function converts the hexadecimal digit string into a Unicode character.
The code written below will take every Unicode string and will convert into the string.
for I in list:
print(I.encode('ascii', 'ignore'))
In case anyone using Python 3 and above wonders, how to do this effectively, I’ll leave this post here for reference, since I didn’t realize the author was asking about Python 2.7…
Just use the built-in python function chr()
:
char = chr(0x2474)
print(char)
Output:
⑴
Remember that the four digits in Unicode codenames U+WXYZ
stand for a hexadecimal number WXYZ
, which in python should be written as 0xWXYZ
.