Unable to import process_tweets from utils

Question:

Thanks for looking into this, I have a python program for which I need to have process_tweet and build_freqs for some NLP task, nltk is installed already and utils wasn’t so I installed it via pip install utils but the above mentioned two modules apparently weren’t installed, the error I got is standard one here,

ImportError: cannot import name 'process_tweet' from
'utils' (C:Pythonlibsite-packagesutils__init__.py)

what have I done wrong or is there anything missing?
Also I referred This stackoverflow answer but it didn’t help.

Asked By: Pawan Nirpal

||

Answers:

Try this code, It should work:

def process_tweet(tweet):
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')
tweet = re.sub(r'$w*', '', tweet)
tweet = re.sub(r'^RT[s]+', '', tweet)
tweet = re.sub(r'https?://.*[rn]*', '', tweet)
tweet = re.sub(r'#', '', tweet)
tokenizer = TweetTokenizer(preserve_case=False,        strip_handles=True,reduce_len=True)
tweet_tokens = tokenizer.tokenize(tweet)

tweets_clean = []
for word in tweet_tokens:
    if (word not in stopwords_english and  
            word not in string.punctuation): 
        stem_word = stemmer.stem(word)  # stemming word
        tweets_clean.append(stem_word)

return tweets_clean
Answered By: Hamad Alibrahim

If you are following the NLP course on deeplearning.ai, then I believe the utils.py file was created by the instructors of that course, for use within the lab sessions, and shouldn’t be confused with the usual utils.

Answered By: sanat bhargava

You can easily access any source code with ??, for example in this case: process_tweet?? (the code above from deeplearning.ai NLP course custome utils library):

def process_tweet(tweet):
"""Process tweet function.
Input:
    tweet: a string containing a tweet
Output:
    tweets_clean: a list of words containing the processed tweet

"""
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')
# remove stock market tickers like $GE
tweet = re.sub(r'$w*', '', tweet)
# remove old style retweet text "RT"
tweet = re.sub(r'^RT[s]+', '', tweet)
# remove hyperlinks
tweet = re.sub(r'https?://.*[rn]*', '', tweet)
# remove hashtags
# only removing the hash # sign from the word
tweet = re.sub(r'#', '', tweet)
# tokenize tweets
tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True,
                           reduce_len=True)
tweet_tokens = tokenizer.tokenize(tweet)

tweets_clean = []
for word in tweet_tokens:
    if (word not in stopwords_english and  # remove stopwords
            word not in string.punctuation):  # remove punctuation
        # tweets_clean.append(word)
        stem_word = stemmer.stem(word)  # stemming word
        tweets_clean.append(stem_word)
Answered By: Emkan

I guess you don’t need to use process_tweet as all. The code in the course is just a shortcut to summarize everything you do from the beginning to the stemming step; hence, just ignore the step and just print out the tweet_stem to see the difference between original text and preprocessed text.

Answered By: Ho bao

You can try this.

def preprocess_tweet(tweet):


# cleaning
tweet = re.sub(r'^RT[s]+','',tweet)

tweet = re.sub(r'https?://[^snr]+', '', tweet)

tweet = re.sub(r'#', '',tweet)
tweet= re.sub(r'@', '',tweet)

# tokenization

token = TweetTokenizer(preserve_case=False, strip_handles=True,reduce_len=True)

tweet_tokenized = token.tokenize(tweet)

# STOP WORDS

stopwords_english = stopwords.words('english')
tweet_processed = []

for word in tweet_tokenized:
    if (word not in stopwords_english and
       word not in string.punctuation):
        
        tweet_processed.append(word)
        
# stemming 
tweet_stem = []

stem = PorterStemmer()

for word in tweet_processed:
    stem_word = stem.stem(word)
    tweet_stem.append(stem_word)
    
    
    
return tweet_stem

Input and Output

Input and Expected Output

Answered By: Malik Hamza