Trying to understand Python Regex's
Question:
I am trying to write a Python regex to capture the full name of someone whose last name is Nakamoto? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
'Satoshi Nakamoto'
'Alice Nakamoto'
'RoboCop Nakamoto'
but not the following:
'satoshi Nakamoto' (where the first name is not capitalised)
'Mr. Nakamoto' (where the preceding word has a nonletter character)
'Nakamoto' (which has no first name)
'Satoshi nakamoto' (where Nakamoto is not capitalised)
I have used the following regex: [A-Z][a-z]+sNakamoto
However this captures both Satoshi Nakamoto
and satoshi Nakamoto
. I would like to understand where I am going wrong and how to correct it. Here is my code:
import re #import regular expressions
#regular expression
NameSearch = re.compile(r'[A-Z][a-z]+sNakamoto', re.I | re.VERBOSE)
# perform search on string
Result = NameSearch.search("Satoshi Nakamoto")
#Debug code to check if it found a match or not
print (Result == None)
if Result != None:
print (Result.group())
Answers:
re.I
means ignore case, so the explicit upper case class you used will match both upper and lower case anyway. Don’t use re.I
. Also, to match “RoboCop”, you need to accept more than one capital letter in a name, so you probably want:
NameSearch = re.compile(r'b[A-Z][a-zA-Z]+sNakamotob', re.VERBOSE)
or the like. This also uses b
as a word boundary detector so you don’t match partway through a string like fooBar Nakamoto
.
Your regular expression actually works fine over here but it will not match the “RoboCop Nakamoto” case.
import re
def printMatch(name):
pat = re.compile(r'b[A-Z][a-zA-Z]+sNakamoto')
if pat.search(name):
print '"'+name+'" matches'
else:
print '"'+name+'" does not match'
printMatch('test satoshi Nakamoto test')
printMatch('test Satoshi Nakamoto test')
printMatch('test RoboCop Nakamoto test')
printMatch('test roboCop Nakamoto test')
The output is this:
"test satoshi Nakamoto test" does not match
"test Satoshi Nakamoto test" matches
"test RoboCop Nakamoto test" matches
"test roboCop Nakamoto test" does not match
The one that worked for me:
rgx = re.compile(r'^[A-Z]w+ Nakamoto')
You can check here: https://regex101.com/r/lNE320/1
I have written the following code, but it is not working either. I believe it is correct.
nakamotoRegex = re.compile(r'[^A-Z][a-z]+ Nakamoto')
mo = nakamotoRegex.search('His name is Rob Nakamoto')
mo.group()
The output is the following:
‘ob Nakamoto’
I am trying to write a Python regex to capture the full name of someone whose last name is Nakamoto? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
'Satoshi Nakamoto'
'Alice Nakamoto'
'RoboCop Nakamoto'
but not the following:
'satoshi Nakamoto' (where the first name is not capitalised)
'Mr. Nakamoto' (where the preceding word has a nonletter character)
'Nakamoto' (which has no first name)
'Satoshi nakamoto' (where Nakamoto is not capitalised)
I have used the following regex: [A-Z][a-z]+sNakamoto
However this captures both Satoshi Nakamoto
and satoshi Nakamoto
. I would like to understand where I am going wrong and how to correct it. Here is my code:
import re #import regular expressions
#regular expression
NameSearch = re.compile(r'[A-Z][a-z]+sNakamoto', re.I | re.VERBOSE)
# perform search on string
Result = NameSearch.search("Satoshi Nakamoto")
#Debug code to check if it found a match or not
print (Result == None)
if Result != None:
print (Result.group())
re.I
means ignore case, so the explicit upper case class you used will match both upper and lower case anyway. Don’t use re.I
. Also, to match “RoboCop”, you need to accept more than one capital letter in a name, so you probably want:
NameSearch = re.compile(r'b[A-Z][a-zA-Z]+sNakamotob', re.VERBOSE)
or the like. This also uses b
as a word boundary detector so you don’t match partway through a string like fooBar Nakamoto
.
Your regular expression actually works fine over here but it will not match the “RoboCop Nakamoto” case.
import re
def printMatch(name):
pat = re.compile(r'b[A-Z][a-zA-Z]+sNakamoto')
if pat.search(name):
print '"'+name+'" matches'
else:
print '"'+name+'" does not match'
printMatch('test satoshi Nakamoto test')
printMatch('test Satoshi Nakamoto test')
printMatch('test RoboCop Nakamoto test')
printMatch('test roboCop Nakamoto test')
The output is this:
"test satoshi Nakamoto test" does not match
"test Satoshi Nakamoto test" matches
"test RoboCop Nakamoto test" matches
"test roboCop Nakamoto test" does not match
The one that worked for me:
rgx = re.compile(r'^[A-Z]w+ Nakamoto')
You can check here: https://regex101.com/r/lNE320/1
I have written the following code, but it is not working either. I believe it is correct.
nakamotoRegex = re.compile(r'[^A-Z][a-z]+ Nakamoto')
mo = nakamotoRegex.search('His name is Rob Nakamoto')
mo.group()
The output is the following:
‘ob Nakamoto’