How can I count occurrences of words specified in an array in Python?

Question:

I am working on a small program in which the user enters text and I would like to check how many times the given words occur in the given input.

# Read user input
print("Input your code: n")

user_input = sys.stdin.read()
print(user_input)

For example, the text that I input in a program is:

a=1
b=3
if (a == 1):
    print("A is a number 1")
elif(b == 3):
    print ("B is 3")
else: 
    print("A isn't 1 and B isn't 3")

The words to find out are specified in an array.

wordsToFind = ["if", "elif", "else", "for", "while"]

And basically I would like to print how many "if", "elif" and "else" has occurred in a input.

How can I count occurrences of words like "if", "elif", "else", "for", "while" in a given string by user input?

Asked By: Sisimośki

||

Answers:

I think the best option is to use the tokenize built-in module of python:

# Let's say this is tokens.py
import sys
from collections import Counter
from io import BytesIO
from tokenize import tokenize

# Get input from stdin
code_text = sys.stdin.read()

# Tokenize the input as python code
tokens = tokenize(BytesIO(code_text.encode("utf-8")).readline)

# Filter the ones in wordsToFind
wordsToFind = ["if", "elif", "else", "for", "while"]
words = [token.string for token in tokens if token.string in wordsToFind]

# Count the occurrences
counter = Counter(words)

print(counter)

Test

Let’s say you have a test.py:

a=1
b=3
if (a == 1):
    print("A is a number 1")
elif(b == 3):
    print ("B is 3")
else: 
    print("A isn't 1 and B isn't 3")

and then you run:

cat test.py | python tokens.py

Output:

Counter({'if': 1, 'elif': 1, 'else': 1})

Advantages

  • Only correct python (syntactically) will be parsed

  • You only will be counting the python keywords (not every if occurrence in the code text, for example, you can have an line like

    a = "if inside str"

    That if should not be counted I think

Answered By: Jorge Morgado
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.