"^" makes no difference in Python regex matching, but does in other regex testers
Question:
I have a regex pattern where I’m trying to match strings with the given format:
string1 = 'test_1.0.0_20220728_151206.log'
According to my regex helper app (Patterns on Mac), this regex matches the above:
pattern = '[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log'
However, this pattern would also match the following string:
string2 = '_test_1.0.0_20220728_151206.log'
Since I don’t want this string matched, I modified it to add a ^ at the beginning of the regex which correctly matches the first string and not the second:
pattern = '^[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log'
However, in Python, when I use re.match(pattern, string)
using both patterns, string_1
is always matched and string_2
is never matched. This is the correct behavior that I would like, but I don’t understand why using the ^
would not make a difference in Python’s matching:
# String 1 matches both patterns
>>> re.match('[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log',
'test_1.0.0_20220728_151206.log')
<re.Match object; span=(0, 30), match='test_1.0.0_20220728_151206.log'>
>>> re.match('^[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log',
'test_1.0.0_20220728_151206.log')
<re.Match object; span=(0, 30), match='test_1.0.0_20220728_151206.log'>
# String 2 does not
>>> re.match('[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log',
'_test_1.0.0_20220728_151206.log')
>>> re.match('^[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log',
'_test_1.0.0_20220728_151206.log')
For anyone using the Patterns app, I use the "Default Flavor" and "Multi-line (^$)" is checked.
What am I missing here?
Answers:
To quote the docs:
Python offers two different primitive operations based on regular expressions: re.match()
checks for a match only at the beginning of the string, while re.search()
checks for a match anywhere in the string.
I have a regex pattern where I’m trying to match strings with the given format:
string1 = 'test_1.0.0_20220728_151206.log'
According to my regex helper app (Patterns on Mac), this regex matches the above:
pattern = '[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log'
However, this pattern would also match the following string:
string2 = '_test_1.0.0_20220728_151206.log'
Since I don’t want this string matched, I modified it to add a ^ at the beginning of the regex which correctly matches the first string and not the second:
pattern = '^[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log'
However, in Python, when I use re.match(pattern, string)
using both patterns, string_1
is always matched and string_2
is never matched. This is the correct behavior that I would like, but I don’t understand why using the ^
would not make a difference in Python’s matching:
# String 1 matches both patterns
>>> re.match('[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log',
'test_1.0.0_20220728_151206.log')
<re.Match object; span=(0, 30), match='test_1.0.0_20220728_151206.log'>
>>> re.match('^[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log',
'test_1.0.0_20220728_151206.log')
<re.Match object; span=(0, 30), match='test_1.0.0_20220728_151206.log'>
# String 2 does not
>>> re.match('[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log',
'_test_1.0.0_20220728_151206.log')
>>> re.match('^[a-z0-9]+_[0-9]+.[0-9]+.[0-9]+_[0-9]{8}_[0-9]{6}.log',
'_test_1.0.0_20220728_151206.log')
For anyone using the Patterns app, I use the "Default Flavor" and "Multi-line (^$)" is checked.
What am I missing here?
To quote the docs:
Python offers two different primitive operations based on regular expressions:
re.match()
checks for a match only at the beginning of the string, whilere.search()
checks for a match anywhere in the string.