How to get the Unicode character from a code point variable?
Question:
I have a variable which stores the string "u05e2"
(The value is constantly changing because I set it within a loop). I want to print the Hebrew letter with that Unicode value. I tried the following but it didn’t work:
>>> a = 'u05e2'
>>> print(u'{}'.format(a))
I got u05e2
instead of ע
(In this case).
I also tried to do:
>>> a = 'u05e2'
>>> b = '\' + a
>>> print(u'{}'.format(b))
Neither one worked. How can I fix this?
Thanks in advance!
Answers:
This is happening because you have to add the suffix u
outside of the string.
a = u'u05e2'
print(a)
ע
Hope this helps you.
All you need is a
before u05e2
. To print a Unicode character, you must provide a unicode format string.
a = 'u05e2'
print(u'{}'.format(a))
#Output
ע
When you try the other approach by printing the
within the print()
function, Python first escapes the
and does not show the desired result.
a = 'u05e2'
print(u'{}'.format(a))
#Output
u05e2
A way to verify the validity of Unicode format strings is using the ord()
built-in function in the Python standard library. This returns the Unicode code point(an integer) of the character passed to it. This function only expects either a Unicode character or a string representing a Unicode character.
a = 'u05e2'
print(ord(a)) #1506, the Unicode code point for the Unicode string stored in a
To print the Unicode character for the above Unicode code value(1506), use the character type formatting with c
. This is explained in the Python docs.
print('{0:c}'.format(1506))
#Output
ע
If we pass a normal string literal to ord()
, we get an error. This is because this string does not represent a Unicode character.
a = 'u05e2'
print(ord(a))
#Error
TypeError: ord() expected a character, but string of length 5 found
This seems like an X-Y Problem. If you want the Unicode character for a code point, use an integer variable and the function chr
(or unichr
on Python 2) instead of trying to format an escape code:
>>> for a in range(0x5e0,0x5eb):
... print(hex(a),chr(a))
...
0x5e0 נ
0x5e1 ס
0x5e2 ע
0x5e3 ף
0x5e4 פ
0x5e5 ץ
0x5e6 צ
0x5e7 ק
0x5e8 ר
0x5e9 ש
0x5ea ת
I have a variable which stores the string "u05e2"
(The value is constantly changing because I set it within a loop). I want to print the Hebrew letter with that Unicode value. I tried the following but it didn’t work:
>>> a = 'u05e2'
>>> print(u'{}'.format(a))
I got u05e2
instead of ע
(In this case).
I also tried to do:
>>> a = 'u05e2'
>>> b = '\' + a
>>> print(u'{}'.format(b))
Neither one worked. How can I fix this?
Thanks in advance!
This is happening because you have to add the suffix u
outside of the string.
a = u'u05e2'
print(a)
ע
Hope this helps you.
All you need is a before
u05e2
. To print a Unicode character, you must provide a unicode format string.
a = 'u05e2'
print(u'{}'.format(a))
#Output
ע
When you try the other approach by printing the within the
print()
function, Python first escapes the and does not show the desired result.
a = 'u05e2'
print(u'{}'.format(a))
#Output
u05e2
A way to verify the validity of Unicode format strings is using the ord()
built-in function in the Python standard library. This returns the Unicode code point(an integer) of the character passed to it. This function only expects either a Unicode character or a string representing a Unicode character.
a = 'u05e2'
print(ord(a)) #1506, the Unicode code point for the Unicode string stored in a
To print the Unicode character for the above Unicode code value(1506), use the character type formatting with c
. This is explained in the Python docs.
print('{0:c}'.format(1506))
#Output
ע
If we pass a normal string literal to ord()
, we get an error. This is because this string does not represent a Unicode character.
a = 'u05e2'
print(ord(a))
#Error
TypeError: ord() expected a character, but string of length 5 found
This seems like an X-Y Problem. If you want the Unicode character for a code point, use an integer variable and the function chr
(or unichr
on Python 2) instead of trying to format an escape code:
>>> for a in range(0x5e0,0x5eb):
... print(hex(a),chr(a))
...
0x5e0 נ
0x5e1 ס
0x5e2 ע
0x5e3 ף
0x5e4 פ
0x5e5 ץ
0x5e6 צ
0x5e7 ק
0x5e8 ר
0x5e9 ש
0x5ea ת