Loop through list of strings, remove all banned words from each string item

Question:

I have the following list:

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]

This is a list of words that I want to remove from each of the string items in the list:

bannedWord = ['grated', 'zested', 'thinly', 'chopped', ',']

The resulting list that I am trying to generate is this:

cleaner_list = ["lemons", "cheddar cheese", "carrots"]

So far, I have been unable to achieve this. My attempt is as follows:

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []
    
def RemoveBannedWords(ing):
    pattern = re.compile("\b(grated|zested|thinly|chopped)\W", re.I)
    return pattern.sub("", ing)
    
for ing in dirtylist:
    cleaner_ing = RemoveBannedWords(ing)
    cleaner_list.append(cleaner_ing)
    
print(cleaner_list)

This returns:

['lemons zested', 'cheddar cheese', 'carrots, chopped']

I have also tried:

import re

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
cleaner_list = []

bannedWord = ['grated', 'zested', 'thinly', 'chopped']
re_banned_words = re.compile(r"b(" + "|".join(bannedWord) + ")\W", re.I)

def remove_words(ing):
    global re_banned_words
    return re_banned_words.sub("", ing)

for ing in dirtylist:
    cleaner_ing = remove_words(ing)
    cleaner_list.append(cleaner_ing)
  
print(cleaner_list)

This returns:

['lemons zested', 'cheddar cheese', 'carrots, chopped']

I’m a bit lost at this point and not sure where I’m going wrong. Any help is much appreciated.

Asked By: JimmyStrings

||

Answers:

def clearList(dirtyList, bannedWords, splitChar):
    clean = []
    for dirty in dirtyList:
        ban = False
        for w in dirty.split():
            if w in bannedWords:
                ban = True

        if ban is False:
            clean.append(dirty)

    return clean

dirtyList is list that you will clear

bannedWords are words that you dont want

splitChar is charcther that is between the words (" ")

Answered By: Sarper Makas

I would remove , from bannedWord list and use str.strip to strip it:

import re

dirtylist = [
    "lemons zested",
    "grated cheddar cheese",
    "carrots, thinly chopped",
]

bannedWord = ["grated", "zested", "thinly", "chopped"]

pat = re.compile(
    r"b" + "|".join(re.escape(w) for w in bannedWord) + r"b", flags=re.I
)

for w in dirtylist:
    print("{:<30} {}".format(w, pat.sub("", w).strip(" ,")))

Prints:

lemons zested                  lemons
grated cheddar cheese          cheddar cheese
carrots, thinly chopped        carrots
Answered By: Andrej Kesely

Some issues:

  • The final W in your regex requires that there is a character that follows the banned word. So if the banned word is the last word in the input string, that will fail. You could just use b again, like you did at the start of the regex

  • Since you wanted to replace the comma as well, you need to add it as an option. Make sure to not put it inside that same capture group, as then \b at the end would require that comma to be followed by an alphanumerical character. So it should be put as an option right at the very end (or start) of your regex.

  • You might want to call .strip() on the resulting string to remove any white space that remains after the banned words have been removed.

So:

def RemoveBannedWords(ing):
    pattern = re.compile("\b(grated|zested|thinly|chopped)\b|,", re.I)
    return pattern.sub("", ing).strip()
Answered By: trincot

The below seems to work (a naive nested loop)

dirtylist = ["lemons zested", "grated cheddar cheese", "carrots, thinly chopped"]
bannedWords = ['grated', 'zested', 'thinly', 'chopped', ',']
result = []
for words in dirtylist:
    temp = words
    for bannedWord in bannedWords:
        temp = temp.replace(bannedWord, '')
    result.append(temp.strip())
print(result)

output

['lemons', 'cheddar cheese', 'carrots']
Answered By: balderman
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.