Python regex not capturing groups properly
Question:
I have the following regex (?:RE:w+|Reference:)s*((Mr|Mrs|Ms|Miss)?s+([w-]+)s(w+))
.
Input text examples:
- RE:11567 Miss Jane Doe 12345678
- Reference: Miss Jane Doe 12345678
- RE:J123 Miss Jane Doe 12345678
- RE:J123 Miss Jane Doe 12345678 Reference: Test Company
Sample Code:
import re
pattern = re.compile('(?:RE:w+|Reference:)s*((Mr|Mrs|Ms|Miss)?s+([w-]+)s(w+))')
result = pattern.findall('RE:11693 Miss Jane Doe 12345678')
For all 4 I expect the output ('Miss Jane Doe', 'Miss', 'Jane', 'Doe')
. However in 4th text example I get [('Miss Jane Doe', 'Miss', 'Jane', 'Doe'), (' Test Company', '', 'Test', 'Company')]
How can I get the correct output
Answers:
Just add ^
to the start of the regex to only match at the start. This makes it
^(?:RE:w+|Reference:)s*((Mr|Mrs|Ms|Miss)?s+([w-]+)s(w+))
.
I have the following regex (?:RE:w+|Reference:)s*((Mr|Mrs|Ms|Miss)?s+([w-]+)s(w+))
.
Input text examples:
- RE:11567 Miss Jane Doe 12345678
- Reference: Miss Jane Doe 12345678
- RE:J123 Miss Jane Doe 12345678
- RE:J123 Miss Jane Doe 12345678 Reference: Test Company
Sample Code:
import re
pattern = re.compile('(?:RE:w+|Reference:)s*((Mr|Mrs|Ms|Miss)?s+([w-]+)s(w+))')
result = pattern.findall('RE:11693 Miss Jane Doe 12345678')
For all 4 I expect the output ('Miss Jane Doe', 'Miss', 'Jane', 'Doe')
. However in 4th text example I get [('Miss Jane Doe', 'Miss', 'Jane', 'Doe'), (' Test Company', '', 'Test', 'Company')]
How can I get the correct output
Just add ^
to the start of the regex to only match at the start. This makes it
^(?:RE:w+|Reference:)s*((Mr|Mrs|Ms|Miss)?s+([w-]+)s(w+))
.