Can't display Unicode chars with Flask

Question:

I have some strings in my database with Unicode chars that I can’t display properly on my website. However it works correctly in one situation which is interesting.

So it works when I do this:

@app.route('/')
def main():
    return render_template('home.html', text = 'u00e9psu00e9g')
# displays: épség

But it does not work when I do this (query the database and pass the string from result):

@app.route('/')
def main():
    text_string = getText()
    return render_template('home.html', text = text_string )
# displays: u00e9psu00e9g

However when I use exactly the same string that I get from the second version with the first solution it works perfectly.

I am interested to discover why the first solution works and the second does not. Both string should be the same, but when I get it from the server it stays the same when I display it. When I add it manually it’s good again. However unfortunately I have hundreds of strings so I need to use the second one.

Asked By: rihekopo

||

Answers:

What you have in one case is unicode-escape sequences that represent a single Unicode character. In the other case you have literal characters ,u,… that represent six characters. this can be illustrated using raw strings, which ignore Unicode escape sequences:

>>> text = 'u00e9psu00e9g'
>>> print(text)
épség
>>> text = r'u00e9psu00e9g'
>>> print(text)
u00e9psu00e9g

To convert a Unicode string with literal escape sequences, first you need a byte string, then decode with the unicode_escape codec. To obtain a byte string from a Unicode string with literal escape codes for non-ASCII characters, encode it with the ascii codec:

>>> text = r'u00e9psu00e9g'
>>> print(text)
u00e9psu00e9g
>>> print(text.encode('ascii').decode('unicode_escape'))
épség

From your comment you may have text from a JSON data file. If it is proper JSON, this should decode it:

>>> s = r'"u00e9psu00e9g ud83cudf0f"'
>>> print(s)
"u00e9psu00e9g ud83cudf0f"
>>> print(json.loads(s))
épség  

Note that a JSON string is quoted. It would not decode without the double-quotes.

Answered By: Mark Tolonen
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.