What is the correct regex for matching values generated by uuid.uuid4().hex?

Question:

How do I validate that a value is equal to the UUID4 generated by this code?

uuid.uuid4().hex

Should it be some regular expression? The values generated of 32-character-long strings of this form:

60e3bcbff6c1464b8aed5be0fce86052
Asked By: LA_

||

Answers:

Easy enough:

import re
uuid4hex = re.compile('[0-9a-f]{32}Z', re.I)

This matches only for strings that are exactly 32 hexadecimal characters, provided you use the .match() method (searches from the start of the string, see .search() vs. .match()). The Z matches the end of the string (vs. $ which would match at the end of a string or a newline).

Answered By: Martijn Pieters

As far as I know, Martijn’s answer is not 100% correct. A UUID-4 has five groups of hexadecimal characters, the first has 8 chars, the second 4 chars, the third 4 chars, the fourth 4 chars, the fifth 12 chars.

However to make it a valid UUID4 the third group (the one in the middle) must start with a 4:

00000000-0000-4000-0000-000000000000
              ^

And the fourth group must start with 8, 9, a or b.

00000000-0000-4000-a000-000000000000
              ^    ^

So you have to change Martijn’s regex to:

import re
uuid4hex = re.compile('[0-9a-f]{12}4[0-9a-f]{3}[89ab][0-9a-f]{15}Z', re.I)
Answered By: a guy named guy

To be more specific.
This is the most precise regex for catching uuid4 both with and without dash, and that follows all the rules of UUID4:

[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}

You can make sure it also catches capital letters with ignore case. In my example with re.I. (uuid’s do not have capital letters in it’s output, but in input it does not fail, just ignores it. Meaning that in a UUID “f” and “F” is the same)

I created a validater to catch them looking like this:

def valid_uuid(uuid):
    regex = re.compile('^[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}Z', re.I)
    match = regex.match(uuid)
    return bool(match)

Then you can do:

if valid_uuid(my_uuid):
    #Do stuff with valid my_uuid

With ^ in the start and Z in the end I also make sure there is nothing else in the string. This makes sure that “3fc3d0e9-1efb-4eef-ace6-d9d59b62fec5” return true, but “3fc3d0e9-1efb-4eef-ace6-d9d59b62fec5+19187” return false.

Update – the python way below is not foolproof – see comments:

There are other ways to validate a UUID. In python do:

from uuid import UUID
try:
    UUID(my_uuid)
    #my_uuid is valid and you can use it
except ValueError:
    #do what you need when my_uuid is not a uuid
Answered By: Christoffer

Just as a helping note for performance issues, I’ve tested both ways in terms of execution time and the regex validation method is quite a little faster:

import re
from uuid import UUID


def _validate_uuid4(uuid_string):
    try:
        UUID(uuid_string, version=4)
    except ValueError:
        return False
    return True

def _validate_uuid4_re(uuid_string):
    uuid4hex = re.compile('^[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}Z', re.I)
    match = uuid4hex.match(uuid_string)
    return bool(match)

In ipython command:

In [58]: val = str(uuid.uuid4())

In [59]: %time _validate_uuid4(val)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 30.3 µs
Out[59]: True

In [60]: %time _validate_uuid4_re(val)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 25.3 µs
Out[60]: True

In [61]: val = “invalid_uuid”

In [62]: %time _validate_uuid4(val)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 29.3 µs
Out[62]: False

In [63]: %time _validate_uuid4_re(val)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 25.5 µs
Out[63]: False

Answered By: ralfzen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.