How to create regex pattern that removes elements equal to a substring, if and only if it finds an enumeration inside the input string?

Question:

I’m having some trouble creating a regex that if it receives a string with an enum ( element, element, element, element_diferent, ... and element) then just leave the enum element other than hyjk11l

Example 1:

Input string:

"I come with hyjk11l, Mary Johnson, hyjk11l, hyjk11l, hyjk11l and hyjk11l to the center, maybe we'll buy something there"

the output that i need:

"I come with Mary Johnson to the center, maybe we'll buy something there"

Example 2:

Input string:

"In afternoon, I show hyjk11l, John, hyjk11l, and hyjk11l in the lab"

the output that i need:

"In afternoon, I show with John in the lab"

Example 3:

Input string:

"I meet with Katy Perry and hyjk11l here"

the output that i need:

"I meet with Katy Perry here"

I have tried using the replace() function, and some regex combinations but I don’t get the desired result. I think maybe I could remove with replace() and all the ", hyjk11l", "hyjk11l", ", and hyjk11l" and/or "and hyjk11l", but I think that’s complicated because I don’t know how many times I have to do it (this seeks to be general, that is, you do not know what input string will be passed to you, for that the regex would be).

Answers:

Here is what you can do:

inputs = ["I come with hyjk11l, Mary Johnson, hyjk11l, hyjk11l, hyjk11l and hyjk11l to the center, maybe we'll buy something there",
          "In afternoon, I show hyjk11l, John, hyjk11l, and hyjk11l in the lab",
          "I meet with Katy Perry and hyjk11l here",
          "I meet him and hyjk11l and her there"
         ]

pat = r"((?:[s+,]|s?and)s?hyjk11l(?:[s,]?)(?=s))"
for inp in inputs:
    tmp = re.sub(pat, "", inp)
    print(tmp)

Output:

I come with Mary Johnson to the center, maybe we'll buy something there
In afternoon, I show John in the lab
I meet with Katy Perry here
I meet him and her there

Check the regex at Regex101.

Explanation pattern:

  • (?:[s+,]|s?and) : non-capturing group, match one or more whitespaces or comma OR 0 or 1 whitespace and and
  • s? : 0 or 1 whitespace
  • hyjk11l : match this word
  • (?:[s,]?) : non-capturing group, match 0 or 1 whitespace or comma
  • (?=s) : match followed by a whitespace
  • whole pattern builds one group, which will be replaced with ""
Answered By: Rabinzel
from ordered_set import OrderedSet

for s in inputs:
    # remove hyjk11l and a possible comma after
    s = re.sub(r'bw+d+.*?b,?', '', s)
    # remove repeated words (the second **and** is removed)
    s = ' '.join(OrderedSet(s.split()))
    print(s)

OUTPUT
I come with Mary Johnson, and to the center, maybe we'll buy something there
In afternoon, I show John, and in the lab
I meet with Katy Perry and here
I meet him and her there
Answered By: LetzerWille