Python regex to remove text between some pattern

Question:

I have text in following format.

|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text

I want to remove all text in between |start| and |end|

I have tried following re.

regex = '(?<=|start|).+(?=|end|)'
re.sub(regex, ''. text)

It returns

“Again some free text”

But I expect to return

this is another text. Again some free text

Asked By: Hima

||

Answers:

Note the start/end delimiters are in lookaround constructs in your pattern and thus will remain in the resulting string after re.sub. You should convert the lookbehind and lookahead into consuming patterns.

Also, you seem to want to remove special chars after the right hand delimiter, so you need to add [^ws]* at the end of the regex.

You may use

import re
text = """|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text"""
print( re.sub(r'(?s)|start|.*?|end|[^ws]*', '', text).replace('n', '') )
# => this is another text. Again some free text

See the Python demo.

Regex details

  • (?s) – inline DOTALL modifier
  • |start||start| text
  • .*? – any 0+ chars, as few as possible
  • |end||end| text
  • [^ws]* – 0 or more chars other than word and whitespace chars.
Answered By: Wiktor Stribiżew

Try this:

import re

your_string = """|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text"""

regex = r'(|start|).+(|end|.)'

result = re.sub(regex, '', your_string).replace('n', '')

print(result)

Outputs:

this is another text. Again some free text
Answered By: Rithin Chalumuri
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.