remove stopwords using for loop in python

Question:

I am recently studying python loop, and I want to try if I can use for loop to remove stop words and punctuation.

However, my code doesn’t work and it shows punctuation is not defined. (the stopwords can be removed thanks to the comment)

I know it can be achieved using list comprehension and there are a lot of answers in StackOverflow, but I want to know how would it be achieved using for loop.

The code I used for practice is below:

texts = 'I love python, From John'

stopword = ['i', 'from']
punctuations = ','

texts = texts.split()

empty_list = []
    
for text in texts:
    text = text.lower()
    if text not in stopword and punctuation not in punctuations:
        empty_list.append(text)
        print(empty_list)

the expected output would be love python John

thanks for the help

Asked By: Peter

||

Answers:

You are iterating through chars with that loop.
You should use the .split() method to split the string in words (split using ‘ ‘ for example) and then you will be able to .pop() that words from the list of words created with .split().

Answered By: RdVL

You can try this one:

import string 
texts = 'I love python, From John'
stopword = ['i, from']

text = texts.translate(str.maketrans('','',string.punctuation)).lower().split(' ')

empty_list = [word for word in text if word not in stopword]

I added the punctuation remover in order to get rid of ‘,from’, ‘from?’ that wouldn’t be seen otherwise

Answered By: imburningbabe

You could do it as a list comprehension and avoid a for loop like this:

texts = 'I love python, from John'
stopword = ['I', 'from']
punctuations = [',']

[word for word in texts.split(' ') if word not in stopword and word not in [',']]

//['love', 'python,', 'John']
Answered By: Nick

So if I understood correctly your question the answer will be to use the pop instruction after you found a word which is in stopword.

The first mistake was in the stopword list, cause you have only one element there, elements need to be separated like the following example:

list = ['one', 'two', 'three']

After this mention you need also a instruction like remove for deleting the unwanted elements.

So the implementation will be the following:

texts = 'I love python, From John'

stopword = ['i', 'from']

texts = texts.split()

empty_list = []

for text in texts:
    text_original = text
    text = text.lower()
    if text not in stopword:
        empty_list.append(text_original)
        #print(empty_string)
    else:
        texts.remove(text_original)

print(texts, empty_list)

Hope to help you this.

Answered By: vosi23

Punctuation can be attached to words, so user replace.

Then split the text into words, convert them to lower and for each word check if it exist in stopwords

text = 'I love python, From John'
stopwords = ['i', 'from']
punctuations = [',']

# first remove punctuations
for p in punctuations:
    text = text.replace(p, "")
# then split the text string and for each element check if it exist in stopwords

result = []
for word in text.split():
    if word.lower() not in stopwords:
        result.append(word)

# join into string
result = " ".join(result)

Result:

'love python John'
Answered By: Sembei Norimaki
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.