Why does this regex pattern match?

Question

text = '荣耀畅玩 40 Plus'

m = re.match(r'(^[^a-zA-Z0-9]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)', text)
print(m)

It prints out:

<re.Match object; span=(0, 12), match='荣耀畅玩 40 Plus'>

It has two spaces, but the pattern only allows one white space. Why does it match?

Asked By: marlon

||

Answer 1

The pattern matches spaces in the first match group. You can see that with m.groups()

('荣耀畅玩 ', '40', 'Plus')

You could include space in the exclusion to avoid the match.

r'(^[^a-zA-Z0-9 ]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)'

Answered By: tdelaney

Answer 2

When you are matching for 2 or more non-alphanumeric characters, spaces are included.

I’d recommend trying

text = '荣耀畅玩 40 Plus'

m = re.match(r'(^[^a-zA-Z0-9 ]{2,})([a-zA-Z0-9]{2,})s([a-zA-Z0-9]{2,}$)', text)
print(m)

Answered By: Isaac Yee

Question: