Murmur3 Hash Compatibility Between Go and Python

Question:

We have two different libraries, one in Python and one in Go that need to compute murmur3 hashes identically. Unfortunately no matter how hard we try, we cannot get the libraries to produce the same result. It appears from this SO question about Java and Python that compatibility isn’t necessarily straight forward.

Right now we’re using the python mmh3 and Go github.com/spaolacci/murmur3 libraries.

In Go:

hash := murmur3.New128()
hash.Write([]byte("chocolate-covered-espresso-beans"))
fmt.Println(base64.RawURLEncoding.EncodeToString(hash.Sum(nil)))
// Output: cLHSo2nCBxyOezviLM5gwg

In Python:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: jns74izOYMJwsdKjacIHHA (big byteorder)

hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='little', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg (little byteorder)

hash = mmh3.hash_bytes(name.encode('utf-8'))
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg

In Go, murmur3 returns a uint64 so we assume signed=False in Python; however we also tried signed=True and did not get matching hashes.

We’re open to different libraries, but are wondering if there is something wrong with either our Go or Python methodologies of computing a base64 encoded hash from a string. Any help appreciated.

Asked By: bbengfort

||

Answers:

That first Python result is almost right.

>>> binascii.hexlify(base64.b64decode('jns74izOYMJwsdKjacIHHA=='))
b'8e7b3be22cce60c270b1d2a369c2071c'

In Go:

    x, y := murmur3.Sum128([]byte("chocolate-covered-espresso-beans"))
    fmt.Printf("%x %xn", x, y)

Results in:

70b1d2a369c2071c 8e7b3be22cce60c2

So the order of the two words is flipped. To get the same result in Python, you can try something like:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
hash = hash[8:] + hash[:8]
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# cLHSo2nCBxyOezviLM5gwg
Answered By: kichik
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.