Check if string matches pattern
Question:
How do I check if a string matches this pattern?
Uppercase letter, number(s), uppercase letter, number(s)…
Example, These would match:
A1B2
B10L1
C1N200J1
These wouldn’t (‘^’ points to problem)
a1B2
^
A10B
^
AB400
^
Answers:
import re
pattern = re.compile("^([A-Z][0-9]+)+$")
pattern.match(string)
import re
import sys
prog = re.compile('([A-Z]d+)+')
while True:
line = sys.stdin.readline()
if not line: break
if prog.match(line):
print 'matched'
else:
print 'not matched'
regular expressions make this easy …
[A-Z]
will match exactly one character between A and Z
d+
will match one or more digits
()
group things (and also return things… but for now just think of them grouping)
+
selects 1 or more
import re
ab = re.compile("^([A-Z]{1}[0-9]{1})+$")
ab.match(string)
I believe that should work for an uppercase, number pattern.
Please try the following:
import re
name = ["A1B1", "djdd", "B2C4", "C2H2", "jdoi","1A4V"]
# Match names.
for element in name:
m = re.match("(^[A-Z]d[A-Z]d)", element)
if m:
print(m.groups())
One-liner: re.match(r"pattern", string) # No need to compile
import re
>>> if re.match(r"hello[0-9]+", 'hello1'):
... print('Yes')
...
Yes
You can evalute it as bool
if needed
>>> bool(re.match(r"hello[0-9]+", 'hello1'))
True
As stated in the comments, all these answers using re.match
implicitly matches on the start of the string. re.search
is needed if you want to generalize to the whole string.
import re
pattern = re.compile("([A-Z][0-9]+)+")
# finds match anywhere in string
bool(re.search(pattern, 'aA1A1')) # True
# matches on start of string, even though pattern does not have ^ constraint
bool(re.match(pattern, 'aA1A1')) # False
If you need the full string to exactly match the regex, see @Ali Sajjad’s answer using re.fullmatch
Credit: @LondonRob and @conradkleinespel in the comments.
Careful! (Maybe you want to check if FULL string matches)
The re.match(...)
will not work if you want to match the full string.
For example;
re.match("[a-z]+", "abcdef")
✅ will give a match
- But!
re.match("[a-z]+", "abcdef 12345")
✅ will also give a match because there is a part in string which matches (maybe you don’t want that when you’re checking if the entire string is valid or not)
Solution
Use re.fullmatch(...)
. This will only match if the
if re.fullmatch("[a-z]+", my_string):
print("Yes")
Example
re.fullmatch("[a-z]+", "abcdef")
✅ Yes
re.fullmatch("[a-z]+", "abcdef 12345")
❌ No
One liner: bool(re.fullmatch("[a-z]+", my_string))
Ali Sajjad’s answer should be the default, i.e. fullmatch
to avoid false positives.
However, it’s also important to know that you’re always checking not None
for "yes, it’s a match":
The two possibilities are therefore:
if re.fullmatch("[a-z]+", my_string)!=None:
or, as in Ali’s answer:
if bool(re.fullmatch("[a-z]+", my_string)):
To my way of thinking both of these are really quite horribly unreadable. So a simple utility function is needed for readability:
def is_match(pattern, string, flags=re.IGNORECASE | re.DOTALL): # or "is_full_match", as desired
return re.fullmatch(pattern, string, flags)!=None
Those 2 flags are (usually) the most helpful default flags
settings in my experience, rather than "0".
In practice, of course, you may need to examine the Match
object delivered by re.fullmatch
. But for cases where you just need to find whether there’s a match…
How do I check if a string matches this pattern?
Uppercase letter, number(s), uppercase letter, number(s)…
Example, These would match:
A1B2
B10L1
C1N200J1
These wouldn’t (‘^’ points to problem)
a1B2
^
A10B
^
AB400
^
import re
pattern = re.compile("^([A-Z][0-9]+)+$")
pattern.match(string)
import re
import sys
prog = re.compile('([A-Z]d+)+')
while True:
line = sys.stdin.readline()
if not line: break
if prog.match(line):
print 'matched'
else:
print 'not matched'
regular expressions make this easy …
[A-Z]
will match exactly one character between A and Z
d+
will match one or more digits
()
group things (and also return things… but for now just think of them grouping)
+
selects 1 or more
import re
ab = re.compile("^([A-Z]{1}[0-9]{1})+$")
ab.match(string)
I believe that should work for an uppercase, number pattern.
Please try the following:
import re
name = ["A1B1", "djdd", "B2C4", "C2H2", "jdoi","1A4V"]
# Match names.
for element in name:
m = re.match("(^[A-Z]d[A-Z]d)", element)
if m:
print(m.groups())
One-liner: re.match(r"pattern", string) # No need to compile
import re
>>> if re.match(r"hello[0-9]+", 'hello1'):
... print('Yes')
...
Yes
You can evalute it as bool
if needed
>>> bool(re.match(r"hello[0-9]+", 'hello1'))
True
As stated in the comments, all these answers using re.match
implicitly matches on the start of the string. re.search
is needed if you want to generalize to the whole string.
import re
pattern = re.compile("([A-Z][0-9]+)+")
# finds match anywhere in string
bool(re.search(pattern, 'aA1A1')) # True
# matches on start of string, even though pattern does not have ^ constraint
bool(re.match(pattern, 'aA1A1')) # False
If you need the full string to exactly match the regex, see @Ali Sajjad’s answer using re.fullmatch
Credit: @LondonRob and @conradkleinespel in the comments.
Careful! (Maybe you want to check if FULL string matches)
The re.match(...)
will not work if you want to match the full string.
For example;
re.match("[a-z]+", "abcdef")
✅ will give a match- But!
re.match("[a-z]+", "abcdef 12345")
✅ will also give a match because there is a part in string which matches (maybe you don’t want that when you’re checking if the entire string is valid or not)
Solution
Use re.fullmatch(...)
. This will only match if the
if re.fullmatch("[a-z]+", my_string):
print("Yes")
Example
re.fullmatch("[a-z]+", "abcdef")
✅ Yesre.fullmatch("[a-z]+", "abcdef 12345")
❌ No
One liner: bool(re.fullmatch("[a-z]+", my_string))
Ali Sajjad’s answer should be the default, i.e. fullmatch
to avoid false positives.
However, it’s also important to know that you’re always checking not None
for "yes, it’s a match":
The two possibilities are therefore:
if re.fullmatch("[a-z]+", my_string)!=None:
or, as in Ali’s answer:
if bool(re.fullmatch("[a-z]+", my_string)):
To my way of thinking both of these are really quite horribly unreadable. So a simple utility function is needed for readability:
def is_match(pattern, string, flags=re.IGNORECASE | re.DOTALL): # or "is_full_match", as desired
return re.fullmatch(pattern, string, flags)!=None
Those 2 flags are (usually) the most helpful default flags
settings in my experience, rather than "0".
In practice, of course, you may need to examine the Match
object delivered by re.fullmatch
. But for cases where you just need to find whether there’s a match…