Deterministic hashing in Python 3

Question:

I’m using hashing of strings for seeding random states in the following way:

context = "string"
seed = hash(context) % 4294967295 # This is necessary to keep the hash within allowed seed values
np.random.seed(seed)

This is unfortunately (for my usage) non-deterministic between runs in Python 3.3 and up. I do know that I could set the PYTHONHASHSEED environment variable to an integer value to regain the determinism, but I would probably prefer something that feels a bit less hacky, and won’t entirely disregard the extra security added by random hashing. Suggestions?

Asked By: Jimmy C

||

Answers:

Use a purpose-built hash function. zlib.adler32() is an excellent choice; alternatively, check out the hashlib module for more options.

Answered By: user149341

Forcing Python’s built-in hash to be deterministic is intrinsically hacky. If you want to avoid hackitude, use a different hashing function — see e.g in Python-2: https://docs.python.org/2/library/hashlib.html,
and in Python-3: https://docs.python.org/3/library/hashlib.html

Answered By: Alex Martelli

You can actually use a string as seed for random.Random:

>>> import random
>>> r = random.Random('string'); [r.randrange(10) for _ in range(20)]
[0, 6, 3, 6, 4, 4, 6, 9, 9, 9, 9, 9, 5, 7, 5, 3, 0, 4, 8, 1]
>>> r = random.Random('string'); [r.randrange(10) for _ in range(20)]
[0, 6, 3, 6, 4, 4, 6, 9, 9, 9, 9, 9, 5, 7, 5, 3, 0, 4, 8, 1]
>>> r = random.Random('string'); [r.randrange(10) for _ in range(20)]
[0, 6, 3, 6, 4, 4, 6, 9, 9, 9, 9, 9, 5, 7, 5, 3, 0, 4, 8, 1]
>>> r = random.Random('another_string'); [r.randrange(10) for _ in range(20)]
[8, 7, 1, 8, 3, 8, 6, 1, 6, 5, 5, 3, 3, 6, 6, 3, 8, 5, 8, 4]
>>> r = random.Random('another_string'); [r.randrange(10) for _ in range(20)]
[8, 7, 1, 8, 3, 8, 6, 1, 6, 5, 5, 3, 3, 6, 6, 3, 8, 5, 8, 4]
>>> r = random.Random('another_string'); [r.randrange(10) for _ in range(20)]
[8, 7, 1, 8, 3, 8, 6, 1, 6, 5, 5, 3, 3, 6, 6, 3, 8, 5, 8, 4]

It can be convenient, e.g. to use the basename of an input file as seed. For the same input file, the generated numbers will always be the same.

Answered By: Eric Duminil
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.