Remove a particular word in a list of lists if it appears after a set of words

Question:

I have a list of words
list1=['duck','crow','hen','sparrow']
and a list of sentences
list2=[['The crow eats'],['Hen eats blue seeds'],['the duck is cute'],['she eats veggies']]
I want to remove every occurance of the word ‘eats’ if it appears exactly after any of the words from the list.

desired output= [['The crow','Hen blue seeds','the duck is cute'],['she eats veggies']]

def remove_eats(test):
  for i in test:
    for j in i:
     for word in list1:
        j=j.replace(word + " eats", word)
        print(j)
        break

remove_eats(list2)

The replace method is not really working for the strings. Could you help me out? Is it possble with Regex?

Asked By: Codingamethyst

||

Answers:

You can use a regex such as the below, which has a series of alternating positive look-behinds. (Demo)

(?:(?<=[Dd]uck)|(?<=[Cc]row)|(?<=[Hh]en)|(?<=[Ss]parrow))s+eats?b

Python example, using builtin re module:

import re

list1 = ['duck', 'crow', 'hen', 'sparrow']

look_behinds = '|'.join(f'(?<=[{w[0].swapcase()}{w[0]}]{w[1:]})'
                        for w in list1)

EATS_RE = re.compile(rf'(?:{look_behinds})s+eats?b')

sentences = [['The crow eats'],
             ['Hen eats blue seeds'],
             ['the duck is cute'],
             ['she eats veggies']]

repl_sentences = [[EATS_RE.sub('', s, 1) for s in x] for x in sentences]
print(repl_sentences)

Out:

[['The crow'], ['Hen blue seeds'], ['the duck is cute'], ['she eats veggies']]
Answered By: rv.kvetch

Another option is to use a case insensitive pattern and generate an alternation with the words that you want to keep in a capture group.

Then match the word eats after it that you want to remove and replace with capture group 1.

An example with the assembled pattern:

import re

pattern = r"b(duck|crow|hen|sparrow)s+eatsb"
list2 = [['The crow eats'], ['Hen eats blue seeds'], ['the duck is cute'], ['she eats veggies']]

res = [[re.sub(pattern, r"1", s, 0, re.I) for s in lst] for lst in list2]

print(res)

Output

[
  ['The crow'],
  ['Hen blue seeds'],
  ['the duck is cute'],
  ['she eats veggies']
]

See a regex demo and a Python demo

Answered By: The fourth bird