stop-words

Can stop phrases be removed while doing text processing in python?

Can stop phrases be removed while doing text processing in python? Question: On the task that I’m working on, involves finding the cosine similarity using tfidf between a base transcript and other sample transcripts. I am removing stop words for this. But I would also like to remove certain stop phrases that are unique to …

Total answers: 1

remove stopwords using for loop in python

remove stopwords using for loop in python Question: I am recently studying python loop, and I want to try if I can use for loop to remove stop words and punctuation. However, my code doesn’t work and it shows punctuation is not defined. (the stopwords can be removed thanks to the comment) I know it …

Total answers: 5

Spacy, how not to remove "not" when cleaning the text with space

Spacy, how not to remove "not" when cleaning the text with space Question: I use this spacy code to later apply it on my text, but i need the negative words to stay in the text like "not". nlp = spacy.load("en_core_web_sm") def my_tokenizer(sentence): return [token.lemma_ for token in tqdm(nlp(sentence.lower()), leave = False) if token.is_stop == …

Total answers: 1

Remove specific stopwords Pyspark

Remove specific stopwords Pyspark Question: New to Pyspark, I’d like to remove some french stopwords from pyspark column. Due to some constraint, I can’t use NLTK/Spacy, StopWordsRemover is the only option that I got. Below is what I have tried so far without success from pyspark.ml import * from pyspark.ml.feature import * stop = [‘EARL …

Total answers: 2

Add/remove custom stop words with spacy

Add/remove custom stop words with spacy Question: What is the best way to add/remove stop words with spacy? I am using token.is_stop function and would like to make some custom changes to the set. I was looking at the documentation but could not find anything regarding of stop words. Thanks! Asked By: E.K. || Source …

Total answers: 8

NLTK and Stopwords Fail #lookuperror

NLTK and Stopwords Fail #lookuperror Question: I am trying to start a project of sentiment analysis and I will use the stop words method. I made some research and I found that nltk have stopwords but when I execute the command there is an error. What I do is the following, in order to know …

Total answers: 7

Adding words to scikit-learn's CountVectorizer's stop list

Adding words to scikit-learn's CountVectorizer's stop list Question: Scikit-learn’s CountVectorizer class lets you pass a string ‘english’ to the argument stop_words. I want to add some things to this predefined list. Can anyone tell me how to do this? Asked By: statsNoob || Source Answers: According to the source code for sklearn.feature_extraction.text, the full list …

Total answers: 1

Faster way to remove stop words in Python

Faster way to remove stop words in Python Question: I am trying to remove stopwords from a string of text: from nltk.corpus import stopwords text = ‘hello bye the the hi’ text = ‘ ‘.join([word for word in text.split() if word not in (stopwords.words(‘english’))]) I am processing 6 mil of such strings so speed is …

Total answers: 6

Stopword removal with NLTK

Stopword removal with NLTK Question: I am trying to process a user entered text by removing stopwords using nltk toolkit, but with stopword-removal the words like ‘and’, ‘or’, ‘not’ gets removed. I want these words to be present after stopword removal process as they are operators which are required for later processing text as query. …

Total answers: 6