Python regular expression split by multiple delimiters

Question:

Given the sentence "I want to eat fish and I want to buy a car. Therefore, I have to make money."

I want to split the sentene by

[‘I want to eat fish’, ‘I want to buy a car", Therefore, ‘I have to make money’]

I am trying to split the sentence

re.split('.|and', sentence)

However, it splits the sentence by ‘.’, ‘a’, ‘n’, and ‘d’.

How can I split the sentence by ‘.’ and ‘and’?

Asked By: alryosha

||

Answers:

You need to escape the . in the regex.

import re

s = "I want to eat fish and I want to buy a car. Therefore, I have to make money."

re.split('.|and', s)

Result:

['I want to eat fish ',
 ' I want to buy a car',
 ' Therefore, I have to make money',
 '']
Answered By: Dan Nagle

In addition to escaping the dot (.), which matches any non-newline character in regex, you should also match any leading or trailing spaces in order for the delimiter of the split to consume undesired leading and trailing spaces from the results. Use a positive lookahead pattern to assert a following non-whitespace character in the end to avoid splitting by the trailing dot:

re.split('s*(?:.|and)s*(?=S)', sentence)

This returns:

['I want to eat fish', 'I want to buy a car', 'Therefore, I have to make money.']

Demo: https://replit.com/@blhsing/LimitedVastCookies

Answered By: blhsing
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.