Remove a particular word in a list of lists if it appears after a set of words
Question:
I have a list of words
list1=['duck','crow','hen','sparrow']
and a list of sentences
list2=[['The crow eats'],['Hen eats blue seeds'],['the duck is cute'],['she eats veggies']]
I want to remove every occurance of the word ‘eats’ if it appears exactly after any of the words from the list.
desired output= [['The crow','Hen blue seeds','the duck is cute'],['she eats veggies']]
def remove_eats(test):
for i in test:
for j in i:
for word in list1:
j=j.replace(word + " eats", word)
print(j)
break
remove_eats(list2)
The replace method is not really working for the strings. Could you help me out? Is it possble with Regex?
Answers:
You can use a regex such as the below, which has a series of alternating positive look-behinds. (Demo)
(?:(?<=[Dd]uck)|(?<=[Cc]row)|(?<=[Hh]en)|(?<=[Ss]parrow))s+eats?b
Python example, using builtin re
module:
import re
list1 = ['duck', 'crow', 'hen', 'sparrow']
look_behinds = '|'.join(f'(?<=[{w[0].swapcase()}{w[0]}]{w[1:]})'
for w in list1)
EATS_RE = re.compile(rf'(?:{look_behinds})s+eats?b')
sentences = [['The crow eats'],
['Hen eats blue seeds'],
['the duck is cute'],
['she eats veggies']]
repl_sentences = [[EATS_RE.sub('', s, 1) for s in x] for x in sentences]
print(repl_sentences)
Out:
[['The crow'], ['Hen blue seeds'], ['the duck is cute'], ['she eats veggies']]
Another option is to use a case insensitive pattern and generate an alternation with the words that you want to keep in a capture group.
Then match the word eats after it that you want to remove and replace with capture group 1.
An example with the assembled pattern:
import re
pattern = r"b(duck|crow|hen|sparrow)s+eatsb"
list2 = [['The crow eats'], ['Hen eats blue seeds'], ['the duck is cute'], ['she eats veggies']]
res = [[re.sub(pattern, r"1", s, 0, re.I) for s in lst] for lst in list2]
print(res)
Output
[
['The crow'],
['Hen blue seeds'],
['the duck is cute'],
['she eats veggies']
]
See a regex demo and a Python demo
I have a list of words
list1=['duck','crow','hen','sparrow']
and a list of sentences
list2=[['The crow eats'],['Hen eats blue seeds'],['the duck is cute'],['she eats veggies']]
I want to remove every occurance of the word ‘eats’ if it appears exactly after any of the words from the list.
desired output= [['The crow','Hen blue seeds','the duck is cute'],['she eats veggies']]
def remove_eats(test):
for i in test:
for j in i:
for word in list1:
j=j.replace(word + " eats", word)
print(j)
break
remove_eats(list2)
The replace method is not really working for the strings. Could you help me out? Is it possble with Regex?
You can use a regex such as the below, which has a series of alternating positive look-behinds. (Demo)
(?:(?<=[Dd]uck)|(?<=[Cc]row)|(?<=[Hh]en)|(?<=[Ss]parrow))s+eats?b
Python example, using builtin re
module:
import re
list1 = ['duck', 'crow', 'hen', 'sparrow']
look_behinds = '|'.join(f'(?<=[{w[0].swapcase()}{w[0]}]{w[1:]})'
for w in list1)
EATS_RE = re.compile(rf'(?:{look_behinds})s+eats?b')
sentences = [['The crow eats'],
['Hen eats blue seeds'],
['the duck is cute'],
['she eats veggies']]
repl_sentences = [[EATS_RE.sub('', s, 1) for s in x] for x in sentences]
print(repl_sentences)
Out:
[['The crow'], ['Hen blue seeds'], ['the duck is cute'], ['she eats veggies']]
Another option is to use a case insensitive pattern and generate an alternation with the words that you want to keep in a capture group.
Then match the word eats after it that you want to remove and replace with capture group 1.
An example with the assembled pattern:
import re
pattern = r"b(duck|crow|hen|sparrow)s+eatsb"
list2 = [['The crow eats'], ['Hen eats blue seeds'], ['the duck is cute'], ['she eats veggies']]
res = [[re.sub(pattern, r"1", s, 0, re.I) for s in lst] for lst in list2]
print(res)
Output
[
['The crow'],
['Hen blue seeds'],
['the duck is cute'],
['she eats veggies']
]
See a regex demo and a Python demo