python – re.split a string with a keyword unless there is a specific keyword preceding it

Question:

here is the code:

text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe" 
splitter = re.split('Sir|Mrs', text)

I want the text to be split by the words ‘Sir’ or ‘Mrs’ unless there is the string ‘married to’ before it.

Current output:

''
'John Doe, married to'
'Jane Doe,'
'Jack Doe,'
'Mary Doe'

Desired output:

''
'John Doe, married to Mrs Jane Doe,'
'Jack Doe,'
'Mary Doe'
Asked By: user16779293

||

Answers:

I would use an re.findall approach here:

text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
matches = re.findall(r'b(?:Sir|Mrs) w+ w+(?:, married to (?:Mrs|Sir) w+ w+)?', text)
print(matches)

This prints:

['Sir John Doe, married to Mrs Jane Doe', 'Sir Jack Doe', 'Mrs Mary Doe']

The regex pattern used here says to match:

b(?:Sir|Mrs)                         leading Sir/Mrs
  w+ w+                             first and last names
(?:
    , married to (?:Mrs|Sir) w+ w+  optional 'married to' followed by another name
)?                                    zero or one time
Answered By: Tim Biegeleisen
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.