unicode() vs. str.decode() for a utf8 encoded byte string (python 2.x)

Question:

Is there any reason to prefer unicode(somestring, 'utf8') as opposed to somestring.decode('utf8')?

My only thought is that .decode() is a bound method so python may be able to resolve it more efficiently, but correct me if I’m wrong.

Asked By: ʞɔıu

||

Answers:

It’s easy to benchmark it:

>>> from timeit import Timer
>>> ts = Timer("s.decode('utf-8')", "s = 'ééé'")
>>> ts.timeit()
8.9185450077056885
>>> tu = Timer("unicode(s, 'utf-8')", "s = 'ééé'") 
>>> tu.timeit()
2.7656929492950439
>>> 

Obviously, unicode() is faster.

FWIW, I don’t know where you get the impression that methods would be faster – it’s quite the contrary.

Answered By: bruno desthuilliers

I’d prefer 'something'.decode(...) since the unicode type is no longer there in Python 3.0, while text = b'binarydata'.decode(encoding) is still valid.

Answered By: dF.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.