What is the best way to get a semi long unique id (non sequential) key for Database objects

Question:

Iam building a web app and I would like my URL scheme to look something like this:

someurl.com/object/FJ1341lj

Currently I just use the primary key from my SQL Alchemy objects, but the problem is that I dont want the Urls to be sequential or low numbers. For instance my URLs look like this:

someurl.com/object/1
someurl.com/object/2
Asked By: DantheMan

||

Answers:

you could always take a hash of the id and then represent the resulting number with a base 62 (0-9, a-z, A-Z) radix.

import string
import hashlib

def enc(val):
    chars = string.digits + string.letters
    num_chars = len(chars)
    r=''
    while val!= 0:
        r+=chars[val % num_chars]
        val/=num_chars
    return r

def fancy_id(i, hash_truncate=12):
    h = hashlib.sha1(str(i))
    return enc(int(h.hexdigest()[:hash_truncate], 16))

fancy_id(1) # 'XYY6dYFg'
fancy_id(2) # '6jxNvE961'

similarly a decode function would exist You would have to store this generated url id in your object. so that you can map back from your url id to the object.

Answered By: Preet Kukreti

probably a little longer than you would like.

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import uuid
>>> uuid.uuid4()
UUID('ba587488-2a96-4daa-b422-60300eb86155')
>>> str(uuid.uuid4())
'001f8565-6330-44a6-977a-1cca201aedcc'
>>> 

And if you are using sqlalchemy you can define an id column of type uuid like so

from sqlalchemy import types
from sqlalchemy.databases.mysql import MSBinary
from sqlalchemy.schema import Column
import uuid


class UUID(types.TypeDecorator):
    impl = MSBinary
    def __init__(self):
        self.impl.length = 16
        types.TypeDecorator.__init__(self,length=self.impl.length)

    def process_bind_param(self,value,dialect=None):
        if value and isinstance(value,uuid.UUID):
            return value.bytes
        elif value and not isinstance(value,uuid.UUID):
            raise ValueError,'value %s is not a valid uuid.UUID' % value
        else:
            return None

    def process_result_value(self,value,dialect=None):
        if value:
            return uuid.UUID(bytes=value)
        else:
            return None

    def is_mutable(self):
        return False


id_column_name = "id"

def id_column():
    import uuid
    return Column(id_column_name,UUID(),primary_key=True,default=uuid.uuid4)

If you are using Django, Preet’s answer is probably more appropriate since a lot of django’s stuff depends on primary keys that are ints.

Answered By: Tom Willis

Looking into your requirement, the best bet would be to use itertools.combinations somewhat like this

>>> urls=itertools.combinations(string.ascii_letters,6)
>>> 'someurl.com/object/'+''.join(x.next())
'someurl.com/object/abcdek'
>>> 'someurl.com/object/'+''.join(x.next())
'someurl.com/object/abcdel'
>>> 'someurl.com/object/'+''.join(x.next())
'someurl.com/object/abcdem'
Answered By: Abhijit

Encoding the integers

You could use a reversible encoding for your integers:

def int_str(val, keyspace):
    """ Turn a positive integer into a string. """
    assert val >= 0
    out = ""
    while val > 0:
        val, digit = divmod(val, len(keyspace))
        out += keyspace[digit]
    return out[::-1]

def str_int(val, keyspace):
    """ Turn a string into a positive integer. """
    out = 0
    for c in val:
        out = out * len(keyspace) + keyspace.index(c)
    return out

Quick testing code:

keyspace = "fw59eorpma2nvxb07liqt83_u6kgzs41-ycdjh" # Can be anything you like - this was just shuffled letters and numbers, but...
assert len(set(keyspace)) == len(keyspace) # each character must occur only once

def test(v):
    s = int_str(v, keyspace)
    w = str_int(s, keyspace)
    print "OK? %r -- int_str(%d) = %r; str_int(%r) = %d" % (v == w, v, s, s, w)

test(1064463423090)
test(4319193500)
test(495689346389)
test(2496486533)

outputs

OK? True -- int_str(1064463423090) = 'antmgabi'; str_int('antmgabi') = 1064463423090
OK? True -- int_str(4319193500) = 'w7q0hm-'; str_int('w7q0hm-') = 4319193500
OK? True -- int_str(495689346389) = 'ev_dpe_d'; str_int('ev_dpe_d') = 495689346389
OK? True -- int_str(2496486533) = '1q2t4w'; str_int('1q2t4w') = 2496486533

Obfuscating them and making them non-continuous

To make the IDs non-contiguous, you could, say, multiply the original value with some arbitrary value, add random “chaff” as the digits-to-be-discarded – with a simple modulus check in my example:

def chaffify(val, chaff_size = 150, chaff_modulus = 7):
    """ Add chaff to the given positive integer.
    chaff_size defines how large the chaffing value is; the larger it is, the larger (and more unwieldy) the resulting value will be.
    chaff_modulus defines the modulus value for the chaff integer; the larger this is, the less chances there are for the chaff validation in dechaffify() to yield a false "okay".
    """
    chaff = random.randint(0, chaff_size / chaff_modulus) * chaff_modulus
    return val * chaff_size + chaff

def dechaffify(chaffy_val, chaff_size = 150, chaff_modulus = 7):
    """ Dechaffs the given chaffed value. The chaff_size and chaff_modulus parameters must be the same as given to chaffify() for the dechaffification to succeed.
    If the chaff value has been tampered with, then a ValueError will (probably - not necessarily) be raised. """
    val, chaff = divmod(chaffy_val, chaff_size)
    if chaff % chaff_modulus != 0:
        raise ValueError("Invalid chaff in value")
    return val

for x in xrange(1, 11):
    chaffed = chaffify(x)
    print x, chaffed, dechaffify(chaffed)

outputs (with randomness):

1 262 1
2 440 2
3 576 3
4 684 4
5 841 5
6 977 6
7 1197 7
8 1326 8
9 1364 9
10 1528 10

EDIT: On second thought, the randomness of the chaff may not be a good idea, as you lose the canonicality of each obfuscated ID — this lacks the randomness but still has validation (changing one digit will likely invalidate the whole number if chaff_val is Large Enough).

def chaffify2(val, chaff_val = 87953):
    """ Add chaff to the given positive integer. """
    return val * chaff_val

def dechaffify2(chaffy_val, chaff_val = 87953):
    """ Dechaffs the given chaffed value. chaff_val must be the same as given to chaffify2(). If the value does not seem to be correctly chaffed, raises a ValueError. """
    val, chaff = divmod(chaffy_val, chaff_val)
    if chaff != 0:
        raise ValueError("Invalid chaff in value")
    return val

Putting it all together

document_id = random.randint(0, 1000000)
url_fragment = int_str(chaffify(document_id))
print "URL for document %d: http://example.com/%s" % (document_id, url_fragment)
request_id = dechaffify(str_int(url_fragment))
print "Requested: Document %d" % request_id

outputs (with randomness)

URL for document 831274: http://example.com/w840pi
Requested: Document 831274
Answered By: AKX
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.