How to count the amount of words said by someone pandas dataframe
Question:
I have a dataframe like this am I’m trying to count the words said by a specific author.
Author Text Date
Jake hey hey my names Jake 1.04.1997
Mac hey my names Mac 1.02.2019
Sarah heymy names Sarah 5.07.2001
I’ve been trying to get it set up in a way where if i were to search for the word "hey" it would produce
Author Count
Jake 2
Mac 1
Answers:
If df
is your original dataframe
newDF = pd.DataFrame(columns=['Author','Count'])
newDF['Author'] = df['Author']
newDF['Count'] = df['Text'].str.count("hey")
newDF.drop(newDF[newDF['Count'] == 0].index, inplace=True)
Use Series.str.count
with aggregate sum
:
df1 = df['Text'].str.count('hey').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 0
2 Sarah 1
If need filter out rows with 0 values add boolean indexing
:
s = df['Text'].str.count('hey')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Sarah 1
EDIT: for test hey
separately add words boundaries bb
like:
df1 = df['Text'].str.count(r'bheyb').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 1
2 Sarah 0
s = df['Text'].str.count(r'bheyb')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 1
I have a dataframe like this am I’m trying to count the words said by a specific author.
Author Text Date
Jake hey hey my names Jake 1.04.1997
Mac hey my names Mac 1.02.2019
Sarah heymy names Sarah 5.07.2001
I’ve been trying to get it set up in a way where if i were to search for the word "hey" it would produce
Author Count
Jake 2
Mac 1
If df
is your original dataframe
newDF = pd.DataFrame(columns=['Author','Count'])
newDF['Author'] = df['Author']
newDF['Count'] = df['Text'].str.count("hey")
newDF.drop(newDF[newDF['Count'] == 0].index, inplace=True)
Use Series.str.count
with aggregate sum
:
df1 = df['Text'].str.count('hey').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 0
2 Sarah 1
If need filter out rows with 0 values add boolean indexing
:
s = df['Text'].str.count('hey')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Sarah 1
EDIT: for test hey
separately add words boundaries bb
like:
df1 = df['Text'].str.count(r'bheyb').groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 1
2 Sarah 0
s = df['Text'].str.count(r'bheyb')
df1 = s[ s.gt(0)].groupby(df['Author']).sum().reset_index(name='Count')
print (df1)
Author Count
0 Jake 2
1 Mac 1