Python – Why use anything other than uuid4() for unique strings?

Question:

I see quit a few implementations of unique string generation for things like uploaded image names, session IDs, et al, and many of them employ the usage of hashes like SHA1, or others.

I’m not questioning the legitimacy of using custom methods like this, but rather just the reason. If I want a unique string, I just say this:

>>> import uuid
>>> uuid.uuid4()
UUID('07033084-5cfd-4812-90a4-e4d24ffb6e3d')

And I’m done with it. I wasn’t very trusting before I read up on uuid, so I did this:

>>> import uuid
>>> s = set()
>>> for i in range(5000000):  # That's 5 million!
>>>     s.add(str(uuid.uuid4()))
...
...
>>> len(s)
5000000

Not one repeater (I wouldn’t expect one now considering the odds are like 1.108e+50, but it’s comforting to see it in action). You could even half the odds by just making your string by combining 2 uuid4()s.

So, with that said, why do people spend time on random() and other stuff for unique strings, etc? Is there an important security issue or other regarding uuid?

Asked By: orokusaki

||

Answers:

One possible reason is that you want the unique string to be human-readable. UUIDs just aren’t easy to read.

Answered By: Jason Baker

Well, sometimes you want collisions. If someone uploads the same exact image twice, maybe you’d rather tell them it’s a duplicate rather than just make another copy with a new name.

Answered By: Ben Voigt

uuids are long, and meaningless (for instance, if you order by uuid, you get a meaningless result).

And, because it’s too long, I wouldn’t want to put it in a URL or expose it to the user in any shape or form.

Answered By: hasen

Using a hash to uniquely identify a resource allows you to generate a ‘unique’ reference from the object. For instance, Git uses SHA hashing to make a unique hash that represents the exact changeset of a single a commit. Since hashing is deterministic, you’ll get the same hash for the same file every time.

Two people across the world could make the same change to the same repo independently, and Git would know they made the same change. UUID v1, v2, and v4 can’t support that since they have no relation to the file or the file’s contents.

Answered By: Arion

In addition to the other answers, hashes are really good for things that should be immutable. The name is unique and can be used to check the integrity of whatever it is attached to at any time.

Answered By: David K. Hess

Also note other kinds of UUID could even be appropriate. For example, if you want your identifier to be orderable, UUID1 is based in part on a timestamp. It’s all really about your application requirements…

Answered By: jsh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.