Trying to understand Python Regex's

Question:

I am trying to write a Python regex to capture the full name of someone whose last name is Nakamoto? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:

'Satoshi Nakamoto'
'Alice Nakamoto'
'RoboCop Nakamoto'

but not the following:

'satoshi Nakamoto' (where the first name is not capitalised)
'Mr. Nakamoto' (where the preceding word has a nonletter character)
'Nakamoto' (which has no first name)
'Satoshi nakamoto' (where Nakamoto is not capitalised)

I have used the following regex: [A-Z][a-z]+sNakamoto

However this captures both Satoshi Nakamoto and satoshi Nakamoto. I would like to understand where I am going wrong and how to correct it. Here is my code:

import re    #import regular expressions

#regular expression
NameSearch = re.compile(r'[A-Z][a-z]+sNakamoto', re.I | re.VERBOSE)

# perform search on string
Result = NameSearch.search("Satoshi Nakamoto")

#Debug code to check if it found a match or not
print (Result == None)

if Result != None:
    print (Result.group())
Asked By: user92592

||

Answers:

re.I means ignore case, so the explicit upper case class you used will match both upper and lower case anyway. Don’t use re.I. Also, to match “RoboCop”, you need to accept more than one capital letter in a name, so you probably want:

NameSearch = re.compile(r'b[A-Z][a-zA-Z]+sNakamotob', re.VERBOSE)

or the like. This also uses b as a word boundary detector so you don’t match partway through a string like fooBar Nakamoto.

Answered By: ShadowRanger

Your regular expression actually works fine over here but it will not match the “RoboCop Nakamoto” case.

import re

def printMatch(name):
    pat = re.compile(r'b[A-Z][a-zA-Z]+sNakamoto')
    if pat.search(name):
        print '"'+name+'" matches'
    else:
        print '"'+name+'" does not match'

printMatch('test satoshi Nakamoto test')
printMatch('test Satoshi Nakamoto test')
printMatch('test RoboCop Nakamoto test')
printMatch('test roboCop Nakamoto test')

The output is this:

"test satoshi Nakamoto test" does not match
"test Satoshi Nakamoto test" matches
"test RoboCop Nakamoto test" matches
"test roboCop Nakamoto test" does not match

The one that worked for me:

rgx = re.compile(r'^[A-Z]w+ Nakamoto')

You can check here: https://regex101.com/r/lNE320/1

Answered By: BlackPioter

I have written the following code, but it is not working either. I believe it is correct.

nakamotoRegex = re.compile(r'[^A-Z][a-z]+ Nakamoto')
mo = nakamotoRegex.search('His name is Rob Nakamoto')
mo.group()

The output is the following:
‘ob Nakamoto’

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.