Persistent Hashing of Strings in Python
Question:
How would you convert an arbitrary string into a unique integer, which would be the same across Python sessions and platforms? For example hash('my string')
wouldn’t work because a different value is returned for each Python session and platform.
Answers:
Use a hash algorithm such as MD5 or SHA1, then convert the hexdigest
via int()
:
>>> import hashlib
>>> int(hashlib.md5('Hello, world!').hexdigest(), 16)
144653930895353261282233826065192032313L
First off, you probably don’t really want the integers to be actually unique. If you do then your numbers might be unlimited in size. If that really is what you want then you could use a bignum library and interpret the bits of the string as the representation of a (potentially very large) integer. If your strings can include the character then you should prepend a 1, so you can distinguish e.g. “
How would you convert an arbitrary string into a unique integer, which would be the same across Python sessions and platforms? For example hash('my string')
wouldn’t work because a different value is returned for each Python session and platform.
Use a hash algorithm such as MD5 or SHA1, then convert the hexdigest
via int()
:
>>> import hashlib
>>> int(hashlib.md5('Hello, world!').hexdigest(), 16)
144653930895353261282233826065192032313L
First off, you probably don’t really want the integers to be actually unique. If you do then your numbers might be unlimited in size. If that really is what you want then you could use a bignum library and interpret the bits of the string as the representation of a (potentially very large) integer. If your strings can include the character then you should prepend a 1, so you can distinguish e.g. “