Problem about regular expression not matching with "1"

Question:

IWhile I want to match string begins with ‘c’ or ends with ‘c’,
I write a regular expression like this:

import re
reg = re.compile(r'^(c)|1$')
reg.search('ca') # match
reg.search('ac') # not match

Actually, the ‘c’ part in the regexp is a complex substring like ‘x|y|z|[0-9_]|…’,
I don’t want to write it twice in a regexp.
And I think it could work with a group matching by use '1',
but I don’t know why it doesn’t work.

I tried to use a named group matching like

reg = re.compile(r'^(?P<name>c)|(?P=name)$')

and it doesn’t work, too.

Asked By: vassiliev

||

Answers:

In the case of 'ac' if the first group doesn’t match, then 1 doesn’t mean anything later in the regular expression. The same is happening with named groups.

This issue is not specific to Python 3.7.

Answered By: Chris

This is not how regex backreferences work. Their purpose is to match the same exact sub-string multiple times in a single string.

For example, regex ([ab])1 will match 'aa' and 'bb' but not 'ab' nor 'ba'.
Until the capturing group 1 is matched, 1 is meaningless.

If you want to avoid repeating yourself, while writing the regex, I suggest to just compose it from a few sub-regexes:

sub_reg = r'c'
reg = re.compile(rf'^{sub_reg}|{sub_reg}$')

This has an additional advantage of improving readability if you give descriptive names to your variables.

Answered By: Piotr Siupa
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.