Cannot extract all words using word or whitespace boundary with regex
Question:
I need extract double Male-Cat
:
a = "Male-Cat Male-Cat Male-Cat-Female"
b = re.findall(r'(?:s|^)Male-Cat(?:s|$)', a)
print (b)
['Male-Cat ']
c = re.findall(r'bMale-Catb', a)
print (c)
['Male-Cat', 'Male-Cat', 'Male-Cat']
I need extract tree times Male-Cat
:
a = "Male-Cat Male-Cat Male-Cat"
b = re.findall(r'(?:s|^)Male-Cat(?:s|$)', a)
print (b)
['Male-Cat ', ' Male-Cat']
c = re.findall(r'bMale-Catb', a)
print (c)
['Male-Cat', 'Male-Cat', 'Male-Cat']
Another strings which are parsed correctly by first way:
a = 'Male-Cat Female-Cat Male-Cat-Female Male-Cat'
a = 'Male-Cat-Female'
a = 'Male-Cat'
Something missing? Can you explain what is wrong and what is correct way?
Answers:
Use lookarounds to extract words inside whitespace boundaries:
r'(?<!S)Male-Cat(?!S)'
See the online regex demo
Details
(?<!S)
– a whitespace or start of string must appear immediately to the left of the current location
Male-Cat
– the term to search for
(?!S)
– a whitespace or end of string must appear immediately to the right of the current location
Since (?<!S)
and (?!S)
are zero-width assertions, the whitespace won’t be consumed, and consecutive matches will get found.
I need extract double Male-Cat
:
a = "Male-Cat Male-Cat Male-Cat-Female"
b = re.findall(r'(?:s|^)Male-Cat(?:s|$)', a)
print (b)
['Male-Cat ']
c = re.findall(r'bMale-Catb', a)
print (c)
['Male-Cat', 'Male-Cat', 'Male-Cat']
I need extract tree times Male-Cat
:
a = "Male-Cat Male-Cat Male-Cat"
b = re.findall(r'(?:s|^)Male-Cat(?:s|$)', a)
print (b)
['Male-Cat ', ' Male-Cat']
c = re.findall(r'bMale-Catb', a)
print (c)
['Male-Cat', 'Male-Cat', 'Male-Cat']
Another strings which are parsed correctly by first way:
a = 'Male-Cat Female-Cat Male-Cat-Female Male-Cat'
a = 'Male-Cat-Female'
a = 'Male-Cat'
Something missing? Can you explain what is wrong and what is correct way?
Use lookarounds to extract words inside whitespace boundaries:
r'(?<!S)Male-Cat(?!S)'
See the online regex demo
Details
(?<!S)
– a whitespace or start of string must appear immediately to the left of the current locationMale-Cat
– the term to search for(?!S)
– a whitespace or end of string must appear immediately to the right of the current location
Since (?<!S)
and (?!S)
are zero-width assertions, the whitespace won’t be consumed, and consecutive matches will get found.