Convert octal representation of UTF-8

Question:

I have a variable like this:

>>> s = '\320\227\320\264\320\260\320\275\320\270\320\265 \320\261\321\213\320\262\321\210\320\265\320\271'
>>> print(s)
320227320264320260320275320270320265 320261321213320262321210320265320271

This contains the octal escape representations of the UTF-8 encoding of the string “Зданиебывшей” (octal 320 227 = hex D0 97 = UTF-8 for “З”). How can I decode this string to “Зданиебывшей”?

Asked By: Dhamo R

||

Answers:

This is a bit of a hack.

s = '\320\227\320\264\320\260\320\275\320\270\320\265 \320\261\321\213\320\262\321\210\320\265\320\271'

b = bytes([int(i, 8) for i in s.split("\")[1:]])

print(b.decode("utf8"))

yields: Зданиебывшей

Or use the codecs module.

b2 = codecs.escape_decode(s)[0]
print(b2.decode("utf8"))

Which would yield the same result.

Answered By: matt
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.