Positive integer from Python hash() function
Question:
I want to use the Python hash()
function to get integer hashes from objects. But built-in hash()
can give negative values, and I want only positive. And I want it to work sensibly on both 32-bit and 64-bit platforms.
I.e. on 32-bit Python, hash()
can return an integer in the range -2**31
to 2**31 - 1
.
On 64-bit systems, hash()
can return an integer in the range -2**63
to 2**63 - 1
.
But I want a hash in the range 0
to 2**32-1
on 32-bit systems, and 0
to 2**64-1
on 64-bit systems.
What is the best way to convert the hash value to its equivalent positive value within the range of the 32- or 64-bit target platform?
(Context: I’m trying to make a new random.Random
style class. According to the random.Random.seed()
docs, the seed “optional argument x can be any hashable object.” So I’d like to duplicate that functionality, except that my seed algorithm can’t handle negative integer values, only positive.)
Answers:
Using sys.maxsize
:
>>> import sys
>>> sys.maxsize
9223372036854775807L
>>> hash('asdf')
-618826466
>>> hash('asdf') % ((sys.maxsize + 1) * 2)
18446744073090725150L
Alternative using ctypes.c_size_t
:
>>> import ctypes
>>> ctypes.c_size_t(hash('asdf')).value
18446744073090725150L
How about:
h = hash(o)
if h < 0:
h += sys.maxsize
This uses sys.maxsize
to be portable between 32- and 64-bit systems.
Just using sys.maxsize
is wrong for obvious reasons (it being `2*n-1 and not 2*n), but the fix is easy enough:
h = hash(obj)
h += sys.maxsize + 1
for performance reasons you may want to split the sys.maxsize + 1 into two separate assignments to avoid creating a long integer temporarily for most negative numbers. Although I doubt this is going to matter much
(Edit: at first I thought you always wanted a 32-bit value)
Simply AND it with a mask of the desired size. Generally sys.maxsize
will already be such a mask, since it’s a power of 2 minus 1.
import sys
assert (sys.maxsize & (sys.maxsize+1)) == 0 # checks that maxsize+1 is a power of 2
new_hash = hash & sys.maxsize
import sys
# Calculate the maximum positive integer value for the target platform
max_int = 2**(sys.maxsize.bit_length() - 1) - 1
# Calculate the positive integer hash value within the range of the target platform
hash_value = hash(obj) % max_int
The expression 2**(sys.maxsize.bit_length() - 1) - 1
is used to calculate the maximum positive integer value for the target platform.
The sys.maxsize
variable contains the maximum value of the largest possible integer for the platform. The bit_length()
method returns the number of bits required to represent the integer in binary, and - 1
is used to account for the sign bit (which is not included in the maxsize
value).
For example, on a 32-bit system, sys.maxsize
has a value of 2147483647, which is the maximum value for a 32-bit signed integer. The bit_length()
method returns 32, and - 1
gives a result of 31. The expression 2**31 - 1
calculates the maximum positive integer value for a 32-bit platform, which is 2147483647.
On a 64-bit system, sys.maxsize
has a value of 9223372036854775807, which is the maximum value for a 64-bit signed integer. The bit_length()
method returns 64, and - 1
gives a result of 63. The expression 2**63 - 1
calculates the maximum positive integer value for a 64-bit platform, which is 9223372036854775807.
This expression can be used to calculate the maximum positive integer value for any platform, regardless of the number of bits.
I want to use the Python hash()
function to get integer hashes from objects. But built-in hash()
can give negative values, and I want only positive. And I want it to work sensibly on both 32-bit and 64-bit platforms.
I.e. on 32-bit Python, hash()
can return an integer in the range -2**31
to 2**31 - 1
.
On 64-bit systems, hash()
can return an integer in the range -2**63
to 2**63 - 1
.
But I want a hash in the range 0
to 2**32-1
on 32-bit systems, and 0
to 2**64-1
on 64-bit systems.
What is the best way to convert the hash value to its equivalent positive value within the range of the 32- or 64-bit target platform?
(Context: I’m trying to make a new random.Random
style class. According to the random.Random.seed()
docs, the seed “optional argument x can be any hashable object.” So I’d like to duplicate that functionality, except that my seed algorithm can’t handle negative integer values, only positive.)
Using sys.maxsize
:
>>> import sys
>>> sys.maxsize
9223372036854775807L
>>> hash('asdf')
-618826466
>>> hash('asdf') % ((sys.maxsize + 1) * 2)
18446744073090725150L
Alternative using ctypes.c_size_t
:
>>> import ctypes
>>> ctypes.c_size_t(hash('asdf')).value
18446744073090725150L
How about:
h = hash(o)
if h < 0:
h += sys.maxsize
This uses sys.maxsize
to be portable between 32- and 64-bit systems.
Just using sys.maxsize
is wrong for obvious reasons (it being `2*n-1 and not 2*n), but the fix is easy enough:
h = hash(obj)
h += sys.maxsize + 1
for performance reasons you may want to split the sys.maxsize + 1 into two separate assignments to avoid creating a long integer temporarily for most negative numbers. Although I doubt this is going to matter much
(Edit: at first I thought you always wanted a 32-bit value)
Simply AND it with a mask of the desired size. Generally sys.maxsize
will already be such a mask, since it’s a power of 2 minus 1.
import sys
assert (sys.maxsize & (sys.maxsize+1)) == 0 # checks that maxsize+1 is a power of 2
new_hash = hash & sys.maxsize
import sys
# Calculate the maximum positive integer value for the target platform
max_int = 2**(sys.maxsize.bit_length() - 1) - 1
# Calculate the positive integer hash value within the range of the target platform
hash_value = hash(obj) % max_int
The expression 2**(sys.maxsize.bit_length() - 1) - 1
is used to calculate the maximum positive integer value for the target platform.
The sys.maxsize
variable contains the maximum value of the largest possible integer for the platform. The bit_length()
method returns the number of bits required to represent the integer in binary, and - 1
is used to account for the sign bit (which is not included in the maxsize
value).
For example, on a 32-bit system, sys.maxsize
has a value of 2147483647, which is the maximum value for a 32-bit signed integer. The bit_length()
method returns 32, and - 1
gives a result of 31. The expression 2**31 - 1
calculates the maximum positive integer value for a 32-bit platform, which is 2147483647.
On a 64-bit system, sys.maxsize
has a value of 9223372036854775807, which is the maximum value for a 64-bit signed integer. The bit_length()
method returns 64, and - 1
gives a result of 63. The expression 2**63 - 1
calculates the maximum positive integer value for a 64-bit platform, which is 9223372036854775807.
This expression can be used to calculate the maximum positive integer value for any platform, regardless of the number of bits.