How to count the amount of words said by someone pandas dataframe

Question:

I have a dataframe like this am I’m trying to count the words said by a specific author.

Author              Text                   Date
Jake                hey hey my names Jake  1.04.1997
Mac                 hey my names Mac       1.02.2019
Sarah               heymy names Sarah      5.07.2001

I’ve been trying to get it set up in a way where if i were to search for the word "hey" it would produce

Author              Count
Jake                2
Mac                 1
Asked By: Matt

||

Answers:

If df is your original dataframe

newDF = pd.DataFrame(columns=['Author','Count'])
newDF['Author'] = df['Author']
newDF['Count'] = df['Text'].str.count("hey")
newDF.drop(newDF[newDF['Count'] == 0].index, inplace=True)

Answered By: Rodrigo Guzman

Use Series.str.count with aggregate sum:

df1 = df['Text'].str.count('hey').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
  Author  Count
0   Jake      2
1    Mac      0
2  Sarah      1

If need filter out rows with 0 values add boolean indexing:

s = df['Text'].str.count('hey')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
  Author  Count
0   Jake      2
1  Sarah      1

EDIT: for test hey separately add words boundaries bb like:

df1 = df['Text'].str.count(r'bheyb').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
  Author  Count
0   Jake      2
1    Mac      1
2  Sarah      0


s = df['Text'].str.count(r'bheyb')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
  Author  Count
0   Jake      2
1    Mac      1
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.