Checking whole string with a regex
Question:
I’m trying to check if a string is a number, so the regex “d+” seemed good. However that regex also fits “78.46.92.168:8000” for some reason, which I do not want, a little bit of code:
class Foo():
_rex = re.compile("d+")
def bar(self, string):
m = _rex.match(string)
if m != None:
doStuff()
And doStuff() is called when the ip adress is entered. I’m kind of confused, how does “.” or “:” match “d”?
Answers:
Change it from d+
to ^d+$
d+
matches any positive number of digits within your string, so it matches the first 78
and succeeds.
Use ^d+$
.
Or, even better: "78.46.92.168:8000".isdigit()
re.match()
always matches from the start of the string (unlike re.search()
) but allows the match to end before the end of the string.
Therefore, you need an anchor: _rex.match(r"d+$")
would work.
To be more explicit, you could also use _rex.match(r"^d+$")
(which is redundant) or just drop re.match()
altogether and just use _rex.search(r"^d+$")
.
Z
matches the end of the string while $
matches the end of the string or just before the newline at the end of the string, and exhibits different behaviour in re.MULTILINE
. See the syntax documentation for detailed information.
>>> s="1234n"
>>> re.search("^d+Z",s)
>>> s="1234"
>>> re.search("^d+Z",s)
<_sre.SRE_Match object at 0xb762ed40>
There are a couple of options in Python to match an entire input with a regex.
Python 2 and 3
In Python 2 and 3, you may use
re.match(r'd+$') # re.match anchors the match at the start of the string, so $ is what remains to add
or – to avoid matching before the final n
in the string:
re.match(r'd+Z') # Z will only match at the very end of the string
Or the same as above with re.search
method requiring the use of ^
/ A
start-of-string anchor as it does not anchor the match at the start of the string:
re.search(r'^d+$')
re.search(r'Ad+Z')
Note that A
is an unambiguous string start anchor, its behavior cannot be redefined with any modifiers (re.M
/ re.MULTILINE
can only redefine the ^
and $
behavior).
Python 3
All those cases described in the above section and one more useful method, re.fullmatch
(also present in the PyPi regex
module):
If the whole string matches the regular expression pattern, return a corresponding match object. Return None
if the string does not match the pattern; note that this is different from a zero-length match.
So, after you compile the regex, just use the appropriate method:
_rex = re.compile("d+")
if _rex.fullmatch(s):
doStuff()
I’m trying to check if a string is a number, so the regex “d+” seemed good. However that regex also fits “78.46.92.168:8000” for some reason, which I do not want, a little bit of code:
class Foo():
_rex = re.compile("d+")
def bar(self, string):
m = _rex.match(string)
if m != None:
doStuff()
And doStuff() is called when the ip adress is entered. I’m kind of confused, how does “.” or “:” match “d”?
Change it from d+
to ^d+$
d+
matches any positive number of digits within your string, so it matches the first 78
and succeeds.
Use ^d+$
.
Or, even better: "78.46.92.168:8000".isdigit()
re.match()
always matches from the start of the string (unlike re.search()
) but allows the match to end before the end of the string.
Therefore, you need an anchor: _rex.match(r"d+$")
would work.
To be more explicit, you could also use _rex.match(r"^d+$")
(which is redundant) or just drop re.match()
altogether and just use _rex.search(r"^d+$")
.
Z
matches the end of the string while $
matches the end of the string or just before the newline at the end of the string, and exhibits different behaviour in re.MULTILINE
. See the syntax documentation for detailed information.
>>> s="1234n"
>>> re.search("^d+Z",s)
>>> s="1234"
>>> re.search("^d+Z",s)
<_sre.SRE_Match object at 0xb762ed40>
There are a couple of options in Python to match an entire input with a regex.
Python 2 and 3
In Python 2 and 3, you may use
re.match(r'd+$') # re.match anchors the match at the start of the string, so $ is what remains to add
or – to avoid matching before the final n
in the string:
re.match(r'd+Z') # Z will only match at the very end of the string
Or the same as above with re.search
method requiring the use of ^
/ A
start-of-string anchor as it does not anchor the match at the start of the string:
re.search(r'^d+$')
re.search(r'Ad+Z')
Note that A
is an unambiguous string start anchor, its behavior cannot be redefined with any modifiers (re.M
/ re.MULTILINE
can only redefine the ^
and $
behavior).
Python 3
All those cases described in the above section and one more useful method, re.fullmatch
(also present in the PyPi regex
module):
If the whole string matches the regular expression pattern, return a corresponding match object. Return
None
if the string does not match the pattern; note that this is different from a zero-length match.
So, after you compile the regex, just use the appropriate method:
_rex = re.compile("d+")
if _rex.fullmatch(s):
doStuff()