Why does this regex pattern match?
Question:
text = '荣耀畅玩 40 Plus'
m = re.match(r'(^[^a-zA-Z0-9]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)', text)
print(m)
It prints out:
<re.Match object; span=(0, 12), match='荣耀畅玩 40 Plus'>
It has two spaces, but the pattern only allows one white space. Why does it match?
Answers:
The pattern matches spaces in the first match group. You can see that with m.groups()
('荣耀畅玩 ', '40', 'Plus')
You could include space in the exclusion to avoid the match.
r'(^[^a-zA-Z0-9 ]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)'
When you are matching for 2 or more non-alphanumeric characters, spaces are included.
I’d recommend trying
text = '荣耀畅玩 40 Plus'
m = re.match(r'(^[^a-zA-Z0-9 ]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)', text)
print(m)
text = '荣耀畅玩 40 Plus'
m = re.match(r'(^[^a-zA-Z0-9]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)', text)
print(m)
It prints out:
<re.Match object; span=(0, 12), match='荣耀畅玩 40 Plus'>
It has two spaces, but the pattern only allows one white space. Why does it match?
The pattern matches spaces in the first match group. You can see that with m.groups()
('荣耀畅玩 ', '40', 'Plus')
You could include space in the exclusion to avoid the match.
r'(^[^a-zA-Z0-9 ]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)'
When you are matching for 2 or more non-alphanumeric characters, spaces are included.
I’d recommend trying
text = '荣耀畅玩 40 Plus'
m = re.match(r'(^[^a-zA-Z0-9 ]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)', text)
print(m)