Python — Regex match pattern OR end of string
Question:
import re
re.findall("(+?1?[ -.]?(?d{3})?[ -.]?d{3}[ -.]?d{4})(?:[ <$])", "+1.222.222.2222<")
The above code works fine if my string ends with a "<" or space. But if it’s the end of the string, it doesn’t work. How do I get +1.222.222.2222 to return in this condition:
import re
re.findall("(+?1?[ -.]?(?d{3})?[ -.]?d{3}[ -.]?d{4})(?:[ <$])", "+1.222.222.2222")
*I removed the "<" and just terminated the string. It returns none in this case. But I’d like it to return the full string — +1.222.222.2222
POSSIBLE ANSWER:
import re
re.findall("(+?1?[ -.]?(?d{3})?[ -.]?d{3}[ -.]?d{4})(?:[ <]|$)", "+1.222.222.2222")
Answers:
I think you’ve solved the end-of-string issue, but there are a couple of other potential issues with the pattern in your question:
- the
-
in [ -.]
either needs to be escaped as -
or placed in the first or last position within square brackets, e.g. [-. ]
or [ .-]
; if you search for []
in the docs here you’ll find the relevant info:
Ranges of characters can be indicated by giving two characters and separating them
by a '-', for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match
all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal
digit. If - is escaped (e.g. [a-z]) or if it’s placed as the first or last character
(e.g. [-a] or [a-]), it will match a literal '-'.
- you may want to require that either matching parentheses or none are present around the first 3 of 10 digits using
(?:(d{3}) ?|d{3}[-. ]?)
Here’s a possible tweak incorporating the above
import re
pat = "^((?:+1[-. ]?|1[-. ]?)?(?:(d{3}) ?|d{3}[-. ]?)d{3}[-. ]?d{4})(?:[ <]|$)"
print( re.findall(pat, "+1.222.222.2222") )
print( re.findall(pat, "+1(222)222.2222") )
print( re.findall(pat, "+1(222.222.2222") )
Output:
['+1.222.222.2222']
['+1(222)222.2222']
[]
Maybe try:
import re
re.findall("(+?1?[ -.]?(?d{3})?[ -.]?d{3}[ -.]?d{4})(?:| |<|$)", "+1.222.222.2222")
null
matches any position, +1.222.222.2222
matches space character, +1.222.222.2222
<
matches less-than sign character, +1.222.222.2222<
$
end of line, +1.222.222.2222
You can also use regex101 for easier debugging.
import re
re.findall("(+?1?[ -.]?(?d{3})?[ -.]?d{3}[ -.]?d{4})(?:[ <$])", "+1.222.222.2222<")
The above code works fine if my string ends with a "<" or space. But if it’s the end of the string, it doesn’t work. How do I get +1.222.222.2222 to return in this condition:
import re
re.findall("(+?1?[ -.]?(?d{3})?[ -.]?d{3}[ -.]?d{4})(?:[ <$])", "+1.222.222.2222")
*I removed the "<" and just terminated the string. It returns none in this case. But I’d like it to return the full string — +1.222.222.2222
POSSIBLE ANSWER:
import re
re.findall("(+?1?[ -.]?(?d{3})?[ -.]?d{3}[ -.]?d{4})(?:[ <]|$)", "+1.222.222.2222")
I think you’ve solved the end-of-string issue, but there are a couple of other potential issues with the pattern in your question:
- the
-
in[ -.]
either needs to be escaped as-
or placed in the first or last position within square brackets, e.g.[-. ]
or[ .-]
; if you search for[]
in the docs here you’ll find the relevant info:
Ranges of characters can be indicated by giving two characters and separating them
by a '-', for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match
all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal
digit. If - is escaped (e.g. [a-z]) or if it’s placed as the first or last character
(e.g. [-a] or [a-]), it will match a literal '-'.
- you may want to require that either matching parentheses or none are present around the first 3 of 10 digits using
(?:(d{3}) ?|d{3}[-. ]?)
Here’s a possible tweak incorporating the above
import re
pat = "^((?:+1[-. ]?|1[-. ]?)?(?:(d{3}) ?|d{3}[-. ]?)d{3}[-. ]?d{4})(?:[ <]|$)"
print( re.findall(pat, "+1.222.222.2222") )
print( re.findall(pat, "+1(222)222.2222") )
print( re.findall(pat, "+1(222.222.2222") )
Output:
['+1.222.222.2222']
['+1(222)222.2222']
[]
Maybe try:
import re
re.findall("(+?1?[ -.]?(?d{3})?[ -.]?d{3}[ -.]?d{4})(?:| |<|$)", "+1.222.222.2222")
null
matches any position,+1.222.222.2222
+1.222.222.2222
<
matches less-than sign character,+1.222.222.2222<
$
end of line,+1.222.222.2222
You can also use regex101 for easier debugging.