count words in a string without using split

Question:

I have a problem on which I am working where I need to count the number of words in a string without using the split() function in Python.
I thought of an approach where I can take a variable word=0 and increment it every time there’s an empty space in the string, but it doesn’t seems to work as it always gave a count less than the actual count.

s="the sky is blue"

def countW(s):
    print(s)
    word=0
    for i in s:
        if i==" ":
            word=word+1
    print(word)
countW(s)

I know it’s a simple question but I am struggling to understand what else I can keep into account to make sure I get the right count.
The second approach I was thinking of involves too much for loop and array creation and then back string conversion.
Can anyone point me to a simpler approach, where I don’t increase the time complexity for this.

Asked By: Faith lost

||

Answers:

Counting the number of spaces is a good approach and works most of the time. Of course you have to add 1 to get the correct number of words.

However, since you seem to be concerned about poorly formatted strings, you have to consider multiple whitespaces, whitespaces at the beginning and the end as well as punctuation.

If you do not want to use regular expressions (as in Ezsrac’s answer), here is an alternative that considers combinations of characters, numbers and the underscore as word, just like w does. It simply counts all transitions between word characters and non-word characters. The end requires special attention to consider non-word characters at the end (for example "a a " vs. "a a").

def is_word_character(c):
    return 'a' <= c <= 'z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '_'

def word_count(str):
    c = 0
    for i in range(1, len(str)):
        if not is_word_character(str[i]) and is_word_character(str[i-1]):
            c += 1
    if is_word_character(str[-1]):
        c += 1
    return c

Here are some test cases:

>>> word_count("the sky is blue")
4
>>> word_count("the sky is blue.The")
5
>>> word_count(" the sky is   blue ")
4
>>> word_count(" the sky is   bluenand not green ")
7

If you also want to include other characters you can simply extend the is_word_character function, but be aware that it is not possible to consider all corner cases without using very advanced techniques. For example, consider "You are good-looking" vs. "This is good-looking into the sky". It is not possible for such a simple program to recognize that the first one is a compound adjective while the second one consists of two sentences which are poorly linked.

Answered By: koalo

if you really don’t want to use split you could try regex:

import re
s= "the sky is blue"
count = len(re.findall(r'w+', s))
print (count)
Answered By: Ezsrac

You could also use itertools.groupby, grouping by whether the characters are alpha-numeric or not, and summing all the values (True equaling 1).

>>> s = "the sky is blue"
>>> sum(k for (k, g) in itertools.groupby(s, key=str.isalnum))
4
Answered By: tobias_k

The simplest finite automata with states – inside a word or outside. Pseudocode:

InsideWord = false
Count = 0
for c in s
    if c is not letter
               InsideWord = false 
    else
         if not InsideWord
               Count++
               InsideWord = true
Answered By: MBo

Simply, take the value of word as 1 while initializing:

print("count words")

s = "the sky is dark and lit with stars"

def countW(s):
    print(s)
    word=1
    
    for i in s:
        if i == " ":
            word=word+1
    print(word)

countW(s)
Answered By: Mayuresh Pachangane
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.