How to create a wordcloud according to frequencies in a pandas dataframe
Question:
I have to plot a wordcloud. ‘tweets.csv’ is a Pandas dataframe which has a column named ‘text’. The plotted graph hasn’t been based on the most common words, tough. How can the words sizes be linked to their frequencies in dataframe?
text = df_final.text.values
wordcloud = WordCloud(
#mask = logomask,
max_words = 1000,
width = 600,
height = 400,
#max_font_size = 1000,
#min_font_size = 100,
normalize_plurals = True,
#scale = 5,
#relative_scaling = 0,
background_color = 'black',
stopwords = STOPWORDS.union(stopwords)
).generate(str(text))
fig = plt.figure(
figsize = (50,40),
facecolor = 'k',
edgecolor = 'k')
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()
My dataframe looks like this:
0 RT @Pontifex_pt: Temos que descobrir as riquezezas ...
1 RT @Pontifex_pt: Todos estamos em viagem rumo ...
2 RT @Pontifex_pt: Unamos as forças, em todos ...
3 RT @GeneralMourao: #Segurançapública, preocupa ...
4 RT @FIFAcom: The Brasileirao U-17 final provided ...
Answers:
Setup a Sample DataFrame:
- Also see DataCamp: Generating WordClouds in Python
- Package documentation is at WordCloud for Python documentation, GitHib: wordcloud, and Anaconda: wordcloud
import pandas as pd
df = pd.DataFrame({'word': ['how', 'are', 'you', 'doing', 'this', 'afternoon'],
'count': [7, 10, 4, 1, 20, 100]})
word count
0 how 7
1 are 10
2 you 4
3 doing 1
4 this 20
5 afternoon 100
Convert the word
& count
columns to a dict
WordCloud().generate_from_frequencies()
requires a dict
- Use one of the following methods
# method 1: convert to dict
data = dict(zip(df['word'].tolist(), df['count'].tolist()))
# method 2: convert to dict
data = df.set_index('word').to_dict()['count']
print(data)
[out]: {'how': 7, 'are': 10, 'you': 4, 'doing': 1, 'this': 20, 'afternoon': 100}
Wordcloud:
- use
.generate_from_frequencies
generate_from_frequencies(frequencies, max_font_size=None)
from wordcloud import WordCloud
wc = WordCloud(width=800, height=400, max_words=200).generate_from_frequencies(data)
Plot
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()
Using an image mask:
twitter_mask = np.array(Image.open('twitter.png'))
wc = WordCloud(background_color='white', width=800, height=400, max_words=200, mask=twitter_mask).generate_from_frequencies(data_nyt)
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(twitter_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()
I show it using example,
example:
will-2
freedom-8
ring-3
day-3
dream-5
let-2
every-3
able-2
one-3
together-4
First import necessary libraries,
from wordcloud import WordCloud
import matplotlib.pyplot as plt
Then create our words as a list,
text={'will': 2, 'freedom': 8, 'ring': 3, 'day': 3, 'dream': 5, 'let': 2, 'every': 3, 'able': 2, 'one': 3, 'together': 4}
Then create wordcloud object,
wordcloud = WordCloud(width=800, height=800, margin=0,repeat=True).generate_from_frequencies(text)
You must add to repeat=True otherwise it is not working.
Then generate image,
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
I have to plot a wordcloud. ‘tweets.csv’ is a Pandas dataframe which has a column named ‘text’. The plotted graph hasn’t been based on the most common words, tough. How can the words sizes be linked to their frequencies in dataframe?
text = df_final.text.values
wordcloud = WordCloud(
#mask = logomask,
max_words = 1000,
width = 600,
height = 400,
#max_font_size = 1000,
#min_font_size = 100,
normalize_plurals = True,
#scale = 5,
#relative_scaling = 0,
background_color = 'black',
stopwords = STOPWORDS.union(stopwords)
).generate(str(text))
fig = plt.figure(
figsize = (50,40),
facecolor = 'k',
edgecolor = 'k')
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()
My dataframe looks like this:
0 RT @Pontifex_pt: Temos que descobrir as riquezezas ...
1 RT @Pontifex_pt: Todos estamos em viagem rumo ...
2 RT @Pontifex_pt: Unamos as forças, em todos ...
3 RT @GeneralMourao: #Segurançapública, preocupa ...
4 RT @FIFAcom: The Brasileirao U-17 final provided ...
Setup a Sample DataFrame:
- Also see DataCamp: Generating WordClouds in Python
- Package documentation is at WordCloud for Python documentation, GitHib: wordcloud, and Anaconda: wordcloud
import pandas as pd
df = pd.DataFrame({'word': ['how', 'are', 'you', 'doing', 'this', 'afternoon'],
'count': [7, 10, 4, 1, 20, 100]})
word count
0 how 7
1 are 10
2 you 4
3 doing 1
4 this 20
5 afternoon 100
Convert the word
& count
columns to a dict
WordCloud().generate_from_frequencies()
requires adict
- Use one of the following methods
# method 1: convert to dict
data = dict(zip(df['word'].tolist(), df['count'].tolist()))
# method 2: convert to dict
data = df.set_index('word').to_dict()['count']
print(data)
[out]: {'how': 7, 'are': 10, 'you': 4, 'doing': 1, 'this': 20, 'afternoon': 100}
Wordcloud:
- use
.generate_from_frequencies
generate_from_frequencies(frequencies, max_font_size=None)
from wordcloud import WordCloud
wc = WordCloud(width=800, height=400, max_words=200).generate_from_frequencies(data)
Plot
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()
Using an image mask:
twitter_mask = np.array(Image.open('twitter.png'))
wc = WordCloud(background_color='white', width=800, height=400, max_words=200, mask=twitter_mask).generate_from_frequencies(data_nyt)
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(twitter_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()
I show it using example,
example:
will-2
freedom-8
ring-3
day-3
dream-5
let-2
every-3
able-2
one-3
together-4
First import necessary libraries,
from wordcloud import WordCloud
import matplotlib.pyplot as plt
Then create our words as a list,
text={'will': 2, 'freedom': 8, 'ring': 3, 'day': 3, 'dream': 5, 'let': 2, 'every': 3, 'able': 2, 'one': 3, 'together': 4}
Then create wordcloud object,
wordcloud = WordCloud(width=800, height=800, margin=0,repeat=True).generate_from_frequencies(text)
You must add to repeat=True otherwise it is not working.
Then generate image,
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()