Why doesn't Python hash function give the same values when run on Android implementation?

Question:

I believed that hash() function works the same in all python interpreters. But it differs when I run it on my mobile using python for android. I get same hash value for hashing strings and numbers but when I hash built-in data types the hash value differs.

PC Python Interpreter (Python 2.7.3)

>>> hash(int)
31585118
>>> hash("hello sl4a")
1532079858
>>> hash(101)
101

Mobile Python Interpreter (Python 2.6.2)

>>> hash(int)
-2146549248
>>> hash("hello sl4a")
1532079858
>>> hash(101)
101

Can any one tell me is it a bug or I misunderstood something.

Asked By: Balakrishnan

||

Answers:

Hashing of things like int relies on id(), which is not guaranteed constant between runs or between interpreters. That is, hash(int) will always produce the same result during a program’s run, but might not compare equal between runs, either on the same platform or on different platforms.

BTW, while hash randomization is available in Python, it’s disabled by default. Since your strings and numbers are hashing equally, clearly it’s not the issue here.

Answered By: Sneftel

hash() is randomised by default each time you start a new instance of recent versions (Python3.3+) to prevent dictionary insertion DOS attacks

Prior to that, hash() was different for 32bit and 64bit builds anyway.

If you want something that does hash to the same thing every time, use one of the hashes in hashlib

>>> import hashlib
>>> hashlib.algorithms
('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512')
Answered By: John La Rooy

for old python (at least, my Python 2.7), it seems that

hash(<some type>) = id(<type>) / 16

and for CPython id() is the address in memory – http://docs.python.org/2/library/functions.html#id

>>> id(int) / hash(int)                                                     
16                                                                              
>>> id(int) % hash(int)                                                 
0                                                                               

so my guess is that the Android port has some strange convention for memory addresses?

anyway, given the above, hashes for types (and other built-ins i guess) will differ across installs because functions are at different addresses.

in contrast, hashes for values (what i think you mean by “non-internal objects”) (before the random stuff was added) are calculated from their values and so likely repeatable.

PS but there’s at least one more CPython wrinkle:

>>> for i in range(-1000,1000):
...     if hash(i) != i: print(i)
...
-1

there’s an answer here somewhere explaining that one…

Answered By: andrew cooke

With CPython, for efficiency reason hash() on internal objects returns the same value as id() which in its turn return the memory location (“address”) of the object.

From one CPython-based interpreter to an other memory location of such object is subject to change. Depending on your OS, this could change from one run to an other.

Answered By: Sylvain Leroux

From Python 3.3 the default hash algorithm has created hash values which are salted with a random value which is different even between different python processes on the same machine.

Hash randomization only is implemented currently for strings – since it was considered to be the most likely data type captured from outside that could be attacked.

The same frozenset consistently produces the same hash value across different machines or even different processes

Source: https://www.quora.com/Do-two-computers-produce-the-same-hash-for-identical-objects-in-Python

Answered By: user9562553
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.