Find most common words from set of sentences in Python

Question:

I have 5 sentences in a np.array and I want to find the most common n number of words that appear. For example if n was 3 I would want the 3 most common words. I have an example below:

0    oh i am she cool though might off her a brownie lol
1    so trash wouldnt do colors better tweet
2    love monkey brownie as much as a tweet
3    monkey get this tweet around i think
4    saw a brownie to make me some monkey

If n was 3 I would like it to print the words: brownie, monkey, tweet. Is there a straighforward way to do something like this?

Asked By: thelaw

||

Answers:

You can do it with the help of CountVectorizer as shown below:

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

A = np.array(["oh i am she cool though might off her a brownie lol", 
              "so trash wouldnt do colors better tweet", 
              "love monkey brownie as much as a tweet",
              "monkey get this tweet around i think",
              "saw a brownie to make me some monkey" ])

n = 3
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(A)

vocabulary = vectorizer.get_feature_names()
ind  = np.argsort(X.toarray().sum(axis=0))[-n:]

top_n_words = [vocabulary[a] for a in ind]

print (top_n_words)
['tweet', 'monkey', 'brownie']
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.