Why does this regex pattern match?

Question:

text = '荣耀畅玩 40 Plus'

m = re.match(r'(^[^a-zA-Z0-9]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)', text)
print(m)

It prints out:

<re.Match object; span=(0, 12), match='荣耀畅玩 40 Plus'>

It has two spaces, but the pattern only allows one white space. Why does it match?

Asked By: marlon

||

Answers:

The pattern matches spaces in the first match group. You can see that with m.groups()

('荣耀畅玩 ', '40', 'Plus')

You could include space in the exclusion to avoid the match.

r'(^[^a-zA-Z0-9 ]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)'
Answered By: tdelaney

When you are matching for 2 or more non-alphanumeric characters, spaces are included.

I’d recommend trying

text = '荣耀畅玩 40 Plus'

m = re.match(r'(^[^a-zA-Z0-9 ]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)', text)
print(m)
Answered By: Isaac Yee
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.