Replacing = with 'x' and then decoding in python

Question:

I fetched the subject of an email message using python modules and received string

'=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?=' 

I know the string is encoded in ‘utf-8’. Python has a method called on strings to decode such strings. But to use the method I needed to replace = sign with x string. By manual interchange and then printing the decoded resulting string, I get the string سلام_کجائی which is exactly what I want. The question is how I can do the interchange automatically? The answer seems harder than just simple usage of functions on strings like replace function.

Below I brought the code I used after manual operation?

r='xD8xB3xD9x84xD8xA7xD9x85_xDAxA9xD8xACxD8xA7xD8xA6xDBx8C'
print r.decode('utf-8')

I would appreciate any workable idea.

Asked By: alexander

||

Answers:

This sort of encoding is known as quoted-printable. There is a Python module for performing encoding and decoding.

You’re right that it’s just a pure quoting of binary strings, so you need to apply UTF-8 decoding afterwards. (Assuming the string is in UTF-8, of course. But that looks correct although I don’t know the language.)

import quopri

print quopri.decodestring( "'=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?='" ).decode( "utf-8" )
Answered By: svk

Just decode it from quoted-printable to get utf8-encoded bytestring:

In [35]: s = '=D8=B3=D9=84=D8=A7=D9=85_=DA=A9=D8=AC=D8=A7=D8=A6=DB=8C?='
In [36]: s.decode('quoted-printable')
Out[36]: 'xd8xb3xd9x84xd8xa7xd9x85_xdaxa9xd8xacxd8xa7xd8xa6xdbx8c?'

Then, if needed, from utf-8 to unicode:

In [37]: s.decode('quoted-printable').decode('utf8')
Out[37]: u'u0633u0644u0627u0645_u06a9u062cu0627u0626u06cc?'

 

In [39]: print s.decode('quoted-printable')
سلام_کجائی?
Answered By: Pavel Anossov

for Python 3, decode x like string, use b prefix:

>>> b"xe4xb8x8bxe4xb8x80xe6xadxa5".decode("utf-8")
'下一步'
Answered By: crifan
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.