python – re.split a string with a keyword unless there is a specific keyword preceding it
Question:
here is the code:
text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
splitter = re.split('Sir|Mrs', text)
I want the text to be split by the words ‘Sir’ or ‘Mrs’ unless there is the string ‘married to’ before it.
Current output:
''
'John Doe, married to'
'Jane Doe,'
'Jack Doe,'
'Mary Doe'
Desired output:
''
'John Doe, married to Mrs Jane Doe,'
'Jack Doe,'
'Mary Doe'
Answers:
I would use an re.findall
approach here:
text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
matches = re.findall(r'b(?:Sir|Mrs) w+ w+(?:, married to (?:Mrs|Sir) w+ w+)?', text)
print(matches)
This prints:
['Sir John Doe, married to Mrs Jane Doe', 'Sir Jack Doe', 'Mrs Mary Doe']
The regex pattern used here says to match:
b(?:Sir|Mrs) leading Sir/Mrs
w+ w+ first and last names
(?:
, married to (?:Mrs|Sir) w+ w+ optional 'married to' followed by another name
)? zero or one time
here is the code:
text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
splitter = re.split('Sir|Mrs', text)
I want the text to be split by the words ‘Sir’ or ‘Mrs’ unless there is the string ‘married to’ before it.
Current output:
''
'John Doe, married to'
'Jane Doe,'
'Jack Doe,'
'Mary Doe'
Desired output:
''
'John Doe, married to Mrs Jane Doe,'
'Jack Doe,'
'Mary Doe'
I would use an re.findall
approach here:
text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
matches = re.findall(r'b(?:Sir|Mrs) w+ w+(?:, married to (?:Mrs|Sir) w+ w+)?', text)
print(matches)
This prints:
['Sir John Doe, married to Mrs Jane Doe', 'Sir Jack Doe', 'Mrs Mary Doe']
The regex pattern used here says to match:
b(?:Sir|Mrs) leading Sir/Mrs
w+ w+ first and last names
(?:
, married to (?:Mrs|Sir) w+ w+ optional 'married to' followed by another name
)? zero or one time