What does "?:" mean in a Python regular expression?

Question:

Below is the Python regular expression. What does the ?: mean in it? What does the expression do overall? How does it match a MAC address such as “00:07:32:12:ac:de:ef“?

re.compile(([dA-Fa-f]{2}(?:[:-][dA-Fa-f]{2}){5}), string)  
Asked By: Hari

||

Answers:

Using ?: as in (?:...) makes the group non-capturing during replace. During find it does’nt make any sense.

Your RegEx means

r"""
(                   # Match the regular expression below and capture its match into backreference number 1
   [dA-Fa-f]          # Match a single character present in the list below
                          # A single digit 0..9
                          # A character in the range between “A” and “F”
                          # A character in the range between “a” and “f”
      {2}                 # Exactly 2 times
   (?:                 # Match the regular expression below
      [:-]                # Match a single character present in the list below
                             # The character “:”
                             # The character “-”
      [dA-Fa-f]          # Match a single character present in the list below
                             # A single digit 0..9
                             # A character in the range between “A” and “F”
                             # A character in the range between “a” and “f”
         {2}                 # Exactly 2 times
   ){5}                # Exactly 5 times
)
"""

Hope this helps.

Answered By: Cylian

It (?:...) means a set of non-capturing grouping parentheses.

Normally, when you write (...) in a regex, it ‘captures’ the matched material. When you use the non-capturing version, it doesn’t capture.

You can get at the various parts matched by the regex using the methods in the re package after the regex matches against a particular string.


How does this regular expression match MAC address “00:07:32:12:ac:de:ef”?

That’s a different question from what you initially asked. However, the regex part is:

([dA-Fa-f]{2}(?:[:-][dA-Fa-f]{2}){5})

The outer most pair of parentheses are capturing parentheses; what they surround will be available when you use the regex against a string successfully.

The [dA-Fa-f]{2} part matches a digit (d) or the hexadecimal digits A-Fa-f], in a pair {2}, followed by a non-capturing grouping where the matched material is a colon or dash (: or -), followed by another pair of hex digits, with the whole repeated exactly 5 times.

p = re.compile(([dA-Fa-f]{2}(?:[:-][dA-Fa-f]{2}){5}))
m = p.match("00:07:32:12:ac:de:ef")
if m:
    m.group(1)

The last line should print the string “00:07:32:12:ac:de” because that is the first set of 6 pairs of hex digits (out of the seven pairs in total in the string). In fact, the outer grouping parentheses are redundant and if omitted, m.group(0) would work (it works even with them). If you need to match 7 pairs, then you change the 5 into a 6. If you need to reject them, then you’d put anchors into the regex:

p = re.compile(^([dA-Fa-f]{2}(?:[:-][dA-Fa-f]{2}){5})$)

The caret ^ matches the start of string; the dollar $ matches the end of string. With the 5, that would not match your sample string. With 6 in place of 5, it would match your string.

Answered By: Jonathan Leffler

(?:...) means a non cature group. The group will not be captured.

Answered By: Prince John Wesley

It does not change the search process. But it affects the retrieval of the group after the match has been found.

For example:
Text:
text = ‘John Wick’

pattern to find:
regex = re.compile(r’John(?:sWick)’) # here we are looking for ‘John’ and also for a group (space + Wick). the ?: makes this group unretrievable.

When we print the match – nothing changes:
<re.Match object; span=(0, 9), match=’John Wick’>

But if you try to manually address the group with (?:) syntax:
res = regex.finditer(text)
for i in res:
print(i)
print(i.group(1)) # here we are trying to retrieve (?:sWick) group

it gives us an error:

IndexError: no such group

Also, look:

Python docs:

(?:…)
A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

the link to the re page in docs:
https://docs.python.org/3/library/re.html

Answered By: Akira Asahi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.