Converting byte string in unicode string


I have a code such that:

a = "u0432"
b = u"u0432"
c = b"u0432"
d = c.decode('utf8')

print(type(a), a)
print(type(b), b)
print(type(c), c)
print(type(d), d)

And output:

<class 'str'> в
<class 'str'> в
<class 'bytes'> b'\u0432'
<class 'str'> u0432

Why in the latter case I see a character code, instead of the character?
How I can transform Byte string to Unicode string that in case of an output I saw the character, instead of its code?

Asked By: Alex T



In strings (or Unicode objects in Python 2), u has a special meaning, namely saying, “here comes a Unicode character specified by it’s Unicode ID”. Hence u"u0432" will result in the character в.

The b'' prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the u code has no special meaning. Hence, b"u0432" is just the sequence of the bytes ,u,0,4,3 and 2.

Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.

You can convert this specification using the unicode escape encoder.

>>> c.decode('unicode_escape')
Answered By: Lennart Regebro

Loved Lennart’s answer. It put me on the right track for solving the particular problem I had faced. What I added was the ability to produce html-compatible code for u???? specifications in strings. Basically, only one line was needed:

results = results.replace('\u','&#x')

This all came about from a need to convert JSON results to something that displays well in a browser. Here is some test code that is integrated with a cloud application:

# References:

import urllib.request
import json

body = [ { "query": "co-development and", "page": 1, "pageSize": 100 } ]
myurl = ""
req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondatabytes = jsondata.encode('utf-8') # needs to be bytes
req.add_header('Content-Length', len(jsondatabytes))
print ('n', jsondatabytes, 'n')
response = urllib.request.urlopen(req, jsondatabytes)
results =
results = results.decode('utf-8')
results = results.replace('\u','&#x') # produces html hex version of u???? unicode characters
Answered By: SoothingMist