How do convert unicode escape sequences to unicode characters in a python string

Question:

When I tried to get the content of a tag using “unicode(head.contents[3])” i get the output similar to this: “Christensen Skxf6ld”. I want the escape sequence to be returned as string. How to do it in python?

Asked By: Vicky

||

Answers:

I suspect that it’s acutally working correctly. By default, Python displays strings in ASCII encoding, since not all terminals support unicode. If you actually print the string, though, it should work. See the following example:

>>> u'xcfa'
u'xcfa'
>>> print u'xcfa'
Ïa
Answered By: BJ Homer

Assuming Python sees the name as a normal string, you’ll first have to decode it to unicode:

>>> name
'Christensen Skxf6ld'
>>> unicode(name, 'latin-1')
u'Christensen Skxf6ld'

Another way of achieving this:

>>> name.decode('latin-1')
u'Christensen Skxf6ld'

Note the “u” in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:

>>> print name.decode('latin-1')
Christensen Sköld

BTW: when necessary, you can use de “encode” method to turn the unicode into e.g. a UTF-8 string:

>>> name.decode('latin-1').encode('utf-8')
'Christensen Skxc3xb6ld'
Answered By: Mark van Lent

Given a byte string with Unicode escapes b"N{SNOWMAN}", b"N{SNOWMAN}".decode('unicode-escape) will produce the expected Unicode string u'u2603'.

Answered By: joeforker
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.