Problem about regular expression not matching with "1"
Question:
IWhile I want to match string begins with ‘c’ or ends with ‘c’,
I write a regular expression like this:
import re
reg = re.compile(r'^(c)|1$')
reg.search('ca') # match
reg.search('ac') # not match
Actually, the ‘c’ part in the regexp is a complex substring like ‘x|y|z|[0-9_]|…’,
I don’t want to write it twice in a regexp.
And I think it could work with a group matching by use '1'
,
but I don’t know why it doesn’t work.
I tried to use a named group matching like
reg = re.compile(r'^(?P<name>c)|(?P=name)$')
and it doesn’t work, too.
Answers:
In the case of 'ac'
if the first group doesn’t match, then 1
doesn’t mean anything later in the regular expression. The same is happening with named groups.
This issue is not specific to Python 3.7.
This is not how regex backreferences work. Their purpose is to match the same exact sub-string multiple times in a single string.
For example, regex ([ab])1
will match 'aa'
and 'bb'
but not 'ab'
nor 'ba'
.
Until the capturing group 1
is matched, 1
is meaningless.
If you want to avoid repeating yourself, while writing the regex, I suggest to just compose it from a few sub-regexes:
sub_reg = r'c'
reg = re.compile(rf'^{sub_reg}|{sub_reg}$')
This has an additional advantage of improving readability if you give descriptive names to your variables.
IWhile I want to match string begins with ‘c’ or ends with ‘c’,
I write a regular expression like this:
import re
reg = re.compile(r'^(c)|1$')
reg.search('ca') # match
reg.search('ac') # not match
Actually, the ‘c’ part in the regexp is a complex substring like ‘x|y|z|[0-9_]|…’,
I don’t want to write it twice in a regexp.
And I think it could work with a group matching by use '1'
,
but I don’t know why it doesn’t work.
I tried to use a named group matching like
reg = re.compile(r'^(?P<name>c)|(?P=name)$')
and it doesn’t work, too.
In the case of 'ac'
if the first group doesn’t match, then 1
doesn’t mean anything later in the regular expression. The same is happening with named groups.
This issue is not specific to Python 3.7.
This is not how regex backreferences work. Their purpose is to match the same exact sub-string multiple times in a single string.
For example, regex ([ab])1
will match 'aa'
and 'bb'
but not 'ab'
nor 'ba'
.
Until the capturing group 1
is matched, 1
is meaningless.
If you want to avoid repeating yourself, while writing the regex, I suggest to just compose it from a few sub-regexes:
sub_reg = r'c'
reg = re.compile(rf'^{sub_reg}|{sub_reg}$')
This has an additional advantage of improving readability if you give descriptive names to your variables.