Generate a Unique String in Python/Django
Question:
What I want is to generate a string(key) of size 5 for my users on my website. More like a BBM PIN.
The key will contain numbers and uppercase English letters:
- AU1B7
- Y56AX
- M0K7A
How can I also be at rest about the uniqueness of the strings even if I generate them in millions?
In the most pythonic way possible, how can I do this?
Answers:
Am not sure about any short cryptic ways, but it can be implemented using a simple straight forward function assuming that you save all the generated strings in a set:
import random
def generate(unique):
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
while True:
value = "".join(random.choice(chars) for _ in range(5))
if value not in unique:
unique.add(value)
break
unique = set()
for _ in range(10):
generate(unique)
My favourite is
import uuid
uuid.uuid4().hex[:6].upper()
If you using django you can set the unique constrain on this field in order to make sure it is unique. https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.Field.unique
size = 5
''.join(random.choice(string.letters[26:] + string.digits) for in range(size))
this will generate some short code, but they can be duplicated. so check if they are unique in your database before saving.
def generate(size=5):
code = ''.join(random.choice(string.letters[26:] + string.digits) for in range(size))
if check_if_duplicate(code):
return generate(size=5)
return code
or using django unique constrain, and handle exceptions.
From 3.6 You can use secrets module to generate nice random strings.
https://docs.python.org/3/library/secrets.html#module-secrets
import secrets
print(secrets.token_hex(5))
If you can afford to lose ‘8’ and ‘9’ in the generated numbers there is a very pythonic solution to getting a random number.
import os
import base64
base64.b32encode(os.urandom(3))[:5].decode('utf-8')
Since you are going for uniqueness then you have a problem since 36 * 36 * 36 * 36 * 36 = 60'466'176
which will definitely result in collisions if you have millions. Since sets are faster than dicts we do…
some_set = set()
def generate():
return base64.b32encode(os.urandom(3))[:5].decode('utf-8')
def generate_unique():
string = generate()
while string in some_set:
string = generate()
some_set.add(string)
return string
However since uniqueness is usually more important I’d recommend generating a unique code for each of the numbers from 0 to 36^5 - 1
like this. We can use a large prime and modulo to make a psuedo-random number like this.
import base64
import math
num = 1
prime_number = 60466181
characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ123456789'
def num_to_code(n: int):
string = ''
hashed = hash_number(n)
for x in range(5):
charnumber = hashed % 36
hashed = math.floor(hashed / 36)
string += characters[charnumber]
return string
def hash_number(n: int, rounds = 20):
if rounds <= 0:
return n
hashed = (n * prime_number) % (36 ** 5)
return hash_number(hashed, rounds - 1)
if __name__ == '__main__':
code = num_to_code(1)
print(code)
Here are the results from generating 0-5, they’ll always generate the same sequence.
0 AAAAA (easily fixable ofc)
1 ZGQR9
2 ON797
3 DUMQ6
4 31384
5 R8IP3
There is a function in django that does what you’re looking for (credits to this answer):
Django provides the function get_random_string()
which will satisfy
the alphanumeric string generation requirement. You don’t need any
extra package because it’s in the django.utils.crypto
module.
>>> from django.utils.crypto import get_random_string
>>> unique_id = get_random_string(length=32)
>>> unique_id
u'rRXVe68NO7m3mHoBS488KdHaqQPD6Ofv'
You can also vary the set of characters with allowed_chars
:
>>> short_genome = get_random_string(length=32, allowed_chars='ACTG')
>>> short_genome
u'CCCAAAAGTACGTCCGGCATTTGTCCACCCCT'
I have a unique field, named ‘systemCode‘ within a lot of my models. And I am generating this manually, but also sometimes it can take value from user input, so I have to check this value before saving and if it matches , regenerating this value as a unique value.
And this is how I generate unique strings at this scenario :
This is my standard class Model :
class ClassOne(models.Model):
name = models.CharField(max_length=100)
systemCode = models.CharField(max_length=25, blank=True, null=True, unique=True)
....
I am using save() method to generate and check this systemCode is unique :
def save(self, *args, **kwargs):
systemCode = self.systemCode
if not systemCode:
systemCode = uuid.uuid4().hex[:6].upper()
while ClassOne.objects.filter(systemCode=systemCode).exclude(pk=self.pk).exists():
systemCode = uuid.uuid4().hex[:6].upper()
self.systemCode = systemCode
super(ClassOne, self).save(*args, **kwargs)
But I have same systemCode field in all my Models. So I am using a function to generate value.
So, this is how to generate unique value for all models using saveSystemCode() function :
import uuid
def saveSystemCode(inClass, inCode, inPK, prefix):
systemCode = inCode
if not systemCode:
systemCode = uuid.uuid4().hex[:6].upper()
while inClass.objects.filter(systemCode=systemCode).exclude(pk=inPK).exists():
systemCode = uuid.uuid4().hex[:6].upper()
return systemCode
class ClassOne(models.Model):
name = models.CharField(max_length=100)
systemCode = models.CharField(max_length=25, blank=True, null=True, unique=True)
....
def save(self, *args, **kwargs):
self.systemCode = saveSystemCode(ClassOne, self.systemCode, self.pk, 'one_')
super(ClassOne, self).save(*args, **kwargs)
class ClassTwo(models.Model):
name = models.CharField(max_length=100)
systemCode = models.CharField(max_length=25, blank=True, null=True, unique=True)
....
def save(self, *args, **kwargs):
self.systemCode = saveSystemCode(ClassTwo, self.systemCode, self.pk, 'two_')
super(ClassTwo, self).save(*args, **kwargs)
class ClassThree(models.Model):
name = models.CharField(max_length=100)
systemCode = models.CharField(max_length=25, blank=True, null=True, unique=True)
....
def save(self, *args, **kwargs):
self.systemCode = saveSystemCode(ClassThree, self.systemCode, self.pk, 'three_')
super(ClassThree, self).save(*args, **kwargs)
while loop in the ‘saveSystemCode‘ function is preventing to save same value again.
A more secure and shorter way of doing is using Django’s crypto module.
from django.utils.crypto import get_random_string
code = get_random_string(5)
get_random_string()
function returns a securely generated random string, uses
secrets
module under the hood.
You can also pass allowed_chars
:
from django.utils.crypto import get_random_string
import string
code = get_random_string(5, allowed_chars=string.ascii_uppercase + string.digits)
To generate unique one you can use below command:
import uuid
str(uuid.uuid1())[:5]
If you have a way of associating each user to a unique ID (for example Primary Key
in Django or Flask). You can do something like this:
Note: This does not generate a fixed length.
We will pad the user_id
to the right to make the generated length a bit static
import os
import base64
user_id = 1
#pad the string
number_generate = str(user_id).rjust(5,"0")
base64.b32encode(bytes(number_generate, 'utf-8')).decode('utf-8').replace('=','')
Here a solution to gen codes of lenght 5 or any on a file:
import shortuuid as su
n = int(input("# codes to gen: "))
l = int(input("code lenght: "))
shou = su.ShortUUID(alphabet="QWERTYUIOPASDFGHJKLZXCVBNM0123456789")
codes = set()
LEN_CNT = 0
with open('file.txt', 'w') as file:
while len(codes) < n:
cd = shou.random(length=l)
codes.add(cd)
if len(codes) > LEN_CNT:
LEN_CNT = len(codes)
file.write(f"{cd}n")
(shortuuid sometimes gen duplicated codes, so I use a set to deal with that)
As the time of writing this answer, there is an actively maintained package that generates short UUIDs:
https://github.com/skorokithakis/shortuuid
For Django support, have a look here:
What I want is to generate a string(key) of size 5 for my users on my website. More like a BBM PIN.
The key will contain numbers and uppercase English letters:
- AU1B7
- Y56AX
- M0K7A
How can I also be at rest about the uniqueness of the strings even if I generate them in millions?
In the most pythonic way possible, how can I do this?
Am not sure about any short cryptic ways, but it can be implemented using a simple straight forward function assuming that you save all the generated strings in a set:
import random
def generate(unique):
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
while True:
value = "".join(random.choice(chars) for _ in range(5))
if value not in unique:
unique.add(value)
break
unique = set()
for _ in range(10):
generate(unique)
My favourite is
import uuid
uuid.uuid4().hex[:6].upper()
If you using django you can set the unique constrain on this field in order to make sure it is unique. https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.Field.unique
size = 5
''.join(random.choice(string.letters[26:] + string.digits) for in range(size))
this will generate some short code, but they can be duplicated. so check if they are unique in your database before saving.
def generate(size=5):
code = ''.join(random.choice(string.letters[26:] + string.digits) for in range(size))
if check_if_duplicate(code):
return generate(size=5)
return code
or using django unique constrain, and handle exceptions.
From 3.6 You can use secrets module to generate nice random strings.
https://docs.python.org/3/library/secrets.html#module-secrets
import secrets
print(secrets.token_hex(5))
If you can afford to lose ‘8’ and ‘9’ in the generated numbers there is a very pythonic solution to getting a random number.
import os
import base64
base64.b32encode(os.urandom(3))[:5].decode('utf-8')
Since you are going for uniqueness then you have a problem since 36 * 36 * 36 * 36 * 36 = 60'466'176
which will definitely result in collisions if you have millions. Since sets are faster than dicts we do…
some_set = set()
def generate():
return base64.b32encode(os.urandom(3))[:5].decode('utf-8')
def generate_unique():
string = generate()
while string in some_set:
string = generate()
some_set.add(string)
return string
However since uniqueness is usually more important I’d recommend generating a unique code for each of the numbers from 0 to 36^5 - 1
like this. We can use a large prime and modulo to make a psuedo-random number like this.
import base64
import math
num = 1
prime_number = 60466181
characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ123456789'
def num_to_code(n: int):
string = ''
hashed = hash_number(n)
for x in range(5):
charnumber = hashed % 36
hashed = math.floor(hashed / 36)
string += characters[charnumber]
return string
def hash_number(n: int, rounds = 20):
if rounds <= 0:
return n
hashed = (n * prime_number) % (36 ** 5)
return hash_number(hashed, rounds - 1)
if __name__ == '__main__':
code = num_to_code(1)
print(code)
Here are the results from generating 0-5, they’ll always generate the same sequence.
0 AAAAA (easily fixable ofc)
1 ZGQR9
2 ON797
3 DUMQ6
4 31384
5 R8IP3
There is a function in django that does what you’re looking for (credits to this answer):
Django provides the function
get_random_string()
which will satisfy
the alphanumeric string generation requirement. You don’t need any
extra package because it’s in thedjango.utils.crypto
module.>>> from django.utils.crypto import get_random_string >>> unique_id = get_random_string(length=32) >>> unique_id u'rRXVe68NO7m3mHoBS488KdHaqQPD6Ofv'
You can also vary the set of characters with
allowed_chars
:>>> short_genome = get_random_string(length=32, allowed_chars='ACTG') >>> short_genome u'CCCAAAAGTACGTCCGGCATTTGTCCACCCCT'
I have a unique field, named ‘systemCode‘ within a lot of my models. And I am generating this manually, but also sometimes it can take value from user input, so I have to check this value before saving and if it matches , regenerating this value as a unique value.
And this is how I generate unique strings at this scenario :
This is my standard class Model :
class ClassOne(models.Model):
name = models.CharField(max_length=100)
systemCode = models.CharField(max_length=25, blank=True, null=True, unique=True)
....
I am using save() method to generate and check this systemCode is unique :
def save(self, *args, **kwargs):
systemCode = self.systemCode
if not systemCode:
systemCode = uuid.uuid4().hex[:6].upper()
while ClassOne.objects.filter(systemCode=systemCode).exclude(pk=self.pk).exists():
systemCode = uuid.uuid4().hex[:6].upper()
self.systemCode = systemCode
super(ClassOne, self).save(*args, **kwargs)
But I have same systemCode field in all my Models. So I am using a function to generate value.
So, this is how to generate unique value for all models using saveSystemCode() function :
import uuid
def saveSystemCode(inClass, inCode, inPK, prefix):
systemCode = inCode
if not systemCode:
systemCode = uuid.uuid4().hex[:6].upper()
while inClass.objects.filter(systemCode=systemCode).exclude(pk=inPK).exists():
systemCode = uuid.uuid4().hex[:6].upper()
return systemCode
class ClassOne(models.Model):
name = models.CharField(max_length=100)
systemCode = models.CharField(max_length=25, blank=True, null=True, unique=True)
....
def save(self, *args, **kwargs):
self.systemCode = saveSystemCode(ClassOne, self.systemCode, self.pk, 'one_')
super(ClassOne, self).save(*args, **kwargs)
class ClassTwo(models.Model):
name = models.CharField(max_length=100)
systemCode = models.CharField(max_length=25, blank=True, null=True, unique=True)
....
def save(self, *args, **kwargs):
self.systemCode = saveSystemCode(ClassTwo, self.systemCode, self.pk, 'two_')
super(ClassTwo, self).save(*args, **kwargs)
class ClassThree(models.Model):
name = models.CharField(max_length=100)
systemCode = models.CharField(max_length=25, blank=True, null=True, unique=True)
....
def save(self, *args, **kwargs):
self.systemCode = saveSystemCode(ClassThree, self.systemCode, self.pk, 'three_')
super(ClassThree, self).save(*args, **kwargs)
while loop in the ‘saveSystemCode‘ function is preventing to save same value again.
A more secure and shorter way of doing is using Django’s crypto module.
from django.utils.crypto import get_random_string
code = get_random_string(5)
get_random_string()
function returns a securely generated random string, uses
secrets
module under the hood.
You can also pass allowed_chars
:
from django.utils.crypto import get_random_string
import string
code = get_random_string(5, allowed_chars=string.ascii_uppercase + string.digits)
To generate unique one you can use below command:
import uuid
str(uuid.uuid1())[:5]
If you have a way of associating each user to a unique ID (for example Primary Key
in Django or Flask). You can do something like this:
Note: This does not generate a fixed length.
We will pad the user_id
to the right to make the generated length a bit static
import os
import base64
user_id = 1
#pad the string
number_generate = str(user_id).rjust(5,"0")
base64.b32encode(bytes(number_generate, 'utf-8')).decode('utf-8').replace('=','')
Here a solution to gen codes of lenght 5 or any on a file:
import shortuuid as su
n = int(input("# codes to gen: "))
l = int(input("code lenght: "))
shou = su.ShortUUID(alphabet="QWERTYUIOPASDFGHJKLZXCVBNM0123456789")
codes = set()
LEN_CNT = 0
with open('file.txt', 'w') as file:
while len(codes) < n:
cd = shou.random(length=l)
codes.add(cd)
if len(codes) > LEN_CNT:
LEN_CNT = len(codes)
file.write(f"{cd}n")
(shortuuid sometimes gen duplicated codes, so I use a set to deal with that)
As the time of writing this answer, there is an actively maintained package that generates short UUIDs:
https://github.com/skorokithakis/shortuuid
For Django support, have a look here: