hash unicode string in python

Question:

I try to hash some unicode strings:

hashlib.sha1(s).hexdigest()
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-81: 
ordinal not in range(128)

where s is something like:

œ∑¡™£¢∞§¶•ªº–≠œ∑´®†¥¨ˆøπ“‘åß∂ƒ©˙∆˚¬…æΩ≈ç√∫˜µ≤≥÷åйцукенгшщзхъфывапролджэячсмитьбююю..юбьтијџўќ†њѓѕ’‘“«««dzћ÷…•∆љl«єђxcvіƒm≤≥ї!@#$©^&*(()––––––––––∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆•…÷ћzdzћ÷…•∆љlљ∆•…÷ћzћ÷…•∆љ∆•…љ∆•…љ∆•…∆љ•…∆љ•…љ∆•…∆•…∆•…∆•∆…•÷∆•…÷∆•…÷∆•…÷∆•…÷∆•…÷∆•…÷∆•…

what should I fix?

Asked By: Vladimir Keleshev

||

Answers:

Apparently hashlib.sha1 isn’t expecting a unicode object, but rather a sequence of bytes in a str object. Encoding your unicode string to a sequence of bytes (using, say, the UTF-8 encoding) should fix it:

>>> import hashlib
>>> s = u'é'
>>> hashlib.sha1(s.encode('utf-8'))
<sha1 HASH object @ 029576A0>

The error is because it is trying to convert the unicode object to a str automatically, using the default ascii encoding, which can’t handle all those non-ASCII characters (since your string isn’t pure ASCII).

A good starting point for learning more about Unicode and encodings is the Python docs, and this article by Joel Spolsky.

Answered By: Cameron

Use encoding format utf-8, Try this easy way,

>>> import hashlib
>>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'
Answered By: Jaykumar Patel

You hash bytes, not strings. So you gotta know what bytes you really want to hash, for example an utf8 memory representation of the string or a utf16 memory representation of the string, etc.

Answered By: drakorg
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.