Read and print unicode literal string from file in Python 3
Question:
If we want to print symbols of alpha and beta in Python then one way is:
print('u03b1')
print('u03b2')
Output:
α
β
What I wish to do is to write the unicode for these symbols in a file: data.txt , read the file and then print the symbols.
data.txt
03b1
03b2
So, I tried
file = open('data.txt')
for word in file:
greek_word = '\u' + word
print(greek_word)
However, I got the output as:
u03b1
u03b2
I am not able to figure out how to print u03b1 into α. I have read through unicode documentation, performed several permutation of encoding, decoding utf-8 etc. but could not succeed.
Python shows type of both variables as str only
Answers:
Use int(hex_string, 16)
to convert the hex representation into the numerical unicode code point, and use chr()
to turn that into the corresponding character:
file = open('data.txt')
for word in file:
greek_word = chr(int(word, 16))
print(greek_word)
Note that this only handles single characters, not words, since you didn’t specify a format in which complete words should be written in data.txt
.
The conversion from 'u03b1'
to 'α'
happens when the expression is being evaluated. In your case, you are evaluating '\u'
and '03b1'
independently and then just concatenating the result. So they are just appended.
What you actually want is to evaluate the concatenated result. This can be done using built-in eval
function. This code should work as expected:
file = open('data.txt')
for word in file:
greek_word = '\u' + word
print(eval(f"'{greek_word}'"))
Here, the concatenated value, u03b1
is first quoted, which results in 'u03b1'
, which is then passes to eval
, which evaluates it to 'α'
as it would have normally done.
If we want to print symbols of alpha and beta in Python then one way is:
print('u03b1')
print('u03b2')
Output:
α
β
What I wish to do is to write the unicode for these symbols in a file: data.txt , read the file and then print the symbols.
data.txt
03b1
03b2
So, I tried
file = open('data.txt')
for word in file:
greek_word = '\u' + word
print(greek_word)
However, I got the output as:
u03b1
u03b2
I am not able to figure out how to print u03b1 into α. I have read through unicode documentation, performed several permutation of encoding, decoding utf-8 etc. but could not succeed.
Python shows type of both variables as str only
Use int(hex_string, 16)
to convert the hex representation into the numerical unicode code point, and use chr()
to turn that into the corresponding character:
file = open('data.txt')
for word in file:
greek_word = chr(int(word, 16))
print(greek_word)
Note that this only handles single characters, not words, since you didn’t specify a format in which complete words should be written in data.txt
.
The conversion from 'u03b1'
to 'α'
happens when the expression is being evaluated. In your case, you are evaluating '\u'
and '03b1'
independently and then just concatenating the result. So they are just appended.
What you actually want is to evaluate the concatenated result. This can be done using built-in eval
function. This code should work as expected:
file = open('data.txt')
for word in file:
greek_word = '\u' + word
print(eval(f"'{greek_word}'"))
Here, the concatenated value, u03b1
is first quoted, which results in 'u03b1'
, which is then passes to eval
, which evaluates it to 'α'
as it would have normally done.