Cheap mapping of string to small fixed-length string

Question:

Just for debugging purposes I would like to map a big string (a session_id, which is difficult to visualize) to a, let’s say, 6 character “hash”. This hash does not need to be secure in any way, just cheap to compute, and of fixed and reduced length (md5 is too long). The input string can have any length.

How would you implement this “cheap_hash” in python so that it is not expensive to compute? It should generate something like this:

def compute_cheap_hash(txt, length=6):
    # do some computation
    return cheap_hash

print compute_cheap_hash("SDFSGSADSADFSasdfgsadfSDASAFSAGAsaDSFSA2345435adfdasgsaed")
aBxr5u
Asked By: blueFast

||

Answers:

def cheaphash(string,length=6):
    if length<len(hashlib.sha256(string).hexdigest()):
        return hashlib.sha256(string).hexdigest()[:length]
    else:
        raise Exception("Length too long. Length of {y} when hash length is {x}.".format(x=str(len(hashlib.sha256(string).hexdigest())),y=length))

This should do what you need it to do, it simply uses the hashlib module, so make sure to import it before using this function.

Answered By: IT Ninja

I found this similar question: https://stackoverflow.com/a/6048639/647991

So here is the function:

import hashlib

def compute_cheap_hash(txt, length=6):
    # This is just a hash for debugging purposes.
    #    It does not need to be unique, just fast and short.
    hash = hashlib.sha1()
    hash.update(txt)
    return hash.hexdigest()[:length]
Answered By: blueFast

I can’t recall if MD5 is uniformly distributed, but it is designed to change a lot even for the smallest difference in the input.

Don’t trust my math, but I guess the collision chance is 1 in 16^6 for the first 6 digits from the MD5 hexdigest, which is about 1 in 17 millions.

So you can just cheap_hash = lambda input: hashlib.md5(input).hexdigest()[:6].

After that you can use hash = cheap_hash(any_input) anywhere.

PS: Any algorithm can be used; MD5 is slightly cheaper to compute but hashlib.sha256 is also a popular choice.

Answered By: Paulo Scardine
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.