How to count the number of words in a sentence, ignoring numbers, punctuation and whitespace?

Question:

How would I go about counting the words in a sentence? I’m using Python.

For example, I might have the string:

string = "I     am having  a   very  nice  23!@$      day. "

That would be 7 words. I’m having trouble with the random amount of spaces after/before each word as well as when numbers or symbols are involved.

Asked By: HossBender

||

Answers:

str.split() without any arguments splits on runs of whitespace characters:

>>> s = 'I am having a very nice day.'
>>> 
>>> len(s.split())
7

From the linked documentation:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

Answered By: arshajii

You can use regex.findall():

import re
line = " I am having a very nice day."
count = len(re.findall(r'w+', line))
print (count)
Answered By: karthikr

Ok here is my version of doing this. I noticed that you want your output to be 7, which means you dont want to count special characters and numbers. So here is regex pattern:

re.findall("[a-zA-Z_]+", string)

Where [a-zA-Z_] means it will match any character beetwen a-z (lowercase) and A-Z (upper case).


About spaces. If you want to remove all extra spaces, just do:

string = string.rstrip().lstrip() # Remove all extra spaces at the start and at the end of the string
while "  " in string: # While  there are 2 spaces beetwen words in our string...
    string = string.replace("  ", " ") # ... replace them by one space!
Answered By: JadedTuna

This is a simple word counter using regex. The script includes a loop which you can terminate it when you’re done.

#word counter using regex
import re
while True:
    string =raw_input("Enter the string: ")
    count = len(re.findall("[a-zA-Z_]+", string))
    if line == "Done": #command to terminate the loop
        break
    print (count)
print ("Terminated")
Answered By: Aliyar
def wordCount(mystring):
    tempcount = 0
    count = 1

    try:
        for character in mystring:
            if character == " ":
                tempcount +=1
                if tempcount ==1:
                    count +=1

                else:
                    tempcount +=1
             else:
                 tempcount=0

         return count

     except Exception:
         error = "Not a string"
         return error

mystring = "I   am having   a    very nice 23!@$      day."

print(wordCount(mystring))

output is 8

Answered By: Darrell White
s = "I     am having  a   very  nice  23!@$      day. "
sum([i.strip(string.punctuation).isalpha() for i in s.split()])

The statement above will go through each chunk of text and remove punctuations before verifying if the chunk is really string of alphabets.

Answered By: boon kwee

How about using a simple loop to count the occurrences of number of spaces!?

txt = "Just an example here move along" 
count = 1
for i in txt:
    if i == " ":
       count += 1
print(count)
Answered By: Anto
import string 

sentence = "I     am having  a   very  nice  23!@$      day. "
# Remove all punctuations
sentence = sentence.translate(str.maketrans('', '', string.punctuation))
# Remove all numbers"
sentence = ''.join([word for word in sentence if not word.isdigit()])
count = 0;
for index in range(len(sentence)-1) :
    if sentence[index+1].isspace() and not sentence[index].isspace():
        count += 1 
print(count)
Answered By: Adam