Remove a pattern if does not contains a specific words

Question:

I need to remove everything from the given text after a specific pattern if doesn’t include specific words. For example, I need to remove everything after a number if doesn’t include "key1" and "key2"

txt1 = "this is a number 123456789 and there aren't any keys here. we might have a lot of words here as well but no key words'

There are no key1 and key2 in this text, so, the output for txt1 should be:

out1 = "this is a number"
txt2 = "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number'

There are key1 and key2 in the above text, so, the output for txt2 should be:

out2 = "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number'

I tried to use negative lookahead as below but it didn’t work.

re.sub(r'd+.*(?!key1|key2).*', '', txt)
Asked By: Naik

||

Answers:

(?=^(?:(?!key[12]).)*$)^.*(?=sd+)

Short Explanation

  • (?=^(?:(?!key[12]).)*$) Assert that the string does not contain neither key1 or key2
  • ^.*?(?=sd+) Capture the string till the digits

See the regex demo

Python Example

import re

strings = [
    "this is a number 123456789 and there aren't any keys here. we might have a lot of words here as well but no key words",
    "this is a number 123456789 but we have their key1 here. key2 might be in the second or the third sentence. hence we can't remove everything after the given number",
]

for string in strings:
    match = re.search(r"(?=^(?:(?!key[12]).)*$)^.*?(?=sd+)", string)
    output = match.group() if match else string
    print(output)
Answered By: Artyom Vancyan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.