How can I count occurrences of words specified in an array in Python?
Question:
I am working on a small program in which the user enters text and I would like to check how many times the given words occur in the given input.
# Read user input
print("Input your code: n")
user_input = sys.stdin.read()
print(user_input)
For example, the text that I input in a program is:
a=1
b=3
if (a == 1):
print("A is a number 1")
elif(b == 3):
print ("B is 3")
else:
print("A isn't 1 and B isn't 3")
The words to find out are specified in an array.
wordsToFind = ["if", "elif", "else", "for", "while"]
And basically I would like to print how many "if", "elif" and "else" has occurred in a input.
How can I count occurrences of words like "if", "elif", "else", "for", "while" in a given string by user input?
Answers:
I think the best option is to use the tokenize
built-in module of python:
# Let's say this is tokens.py
import sys
from collections import Counter
from io import BytesIO
from tokenize import tokenize
# Get input from stdin
code_text = sys.stdin.read()
# Tokenize the input as python code
tokens = tokenize(BytesIO(code_text.encode("utf-8")).readline)
# Filter the ones in wordsToFind
wordsToFind = ["if", "elif", "else", "for", "while"]
words = [token.string for token in tokens if token.string in wordsToFind]
# Count the occurrences
counter = Counter(words)
print(counter)
Test
Let’s say you have a test.py
:
a=1
b=3
if (a == 1):
print("A is a number 1")
elif(b == 3):
print ("B is 3")
else:
print("A isn't 1 and B isn't 3")
and then you run:
cat test.py | python tokens.py
Output:
Counter({'if': 1, 'elif': 1, 'else': 1})
Advantages
-
Only correct python (syntactically) will be parsed
-
You only will be counting the python keywords (not every if occurrence in the code text, for example, you can have an line like
a = "if inside str"
That if should not be counted I think
I am working on a small program in which the user enters text and I would like to check how many times the given words occur in the given input.
# Read user input
print("Input your code: n")
user_input = sys.stdin.read()
print(user_input)
For example, the text that I input in a program is:
a=1
b=3
if (a == 1):
print("A is a number 1")
elif(b == 3):
print ("B is 3")
else:
print("A isn't 1 and B isn't 3")
The words to find out are specified in an array.
wordsToFind = ["if", "elif", "else", "for", "while"]
And basically I would like to print how many "if", "elif" and "else" has occurred in a input.
How can I count occurrences of words like "if", "elif", "else", "for", "while" in a given string by user input?
I think the best option is to use the tokenize
built-in module of python:
# Let's say this is tokens.py
import sys
from collections import Counter
from io import BytesIO
from tokenize import tokenize
# Get input from stdin
code_text = sys.stdin.read()
# Tokenize the input as python code
tokens = tokenize(BytesIO(code_text.encode("utf-8")).readline)
# Filter the ones in wordsToFind
wordsToFind = ["if", "elif", "else", "for", "while"]
words = [token.string for token in tokens if token.string in wordsToFind]
# Count the occurrences
counter = Counter(words)
print(counter)
Test
Let’s say you have a test.py
:
a=1
b=3
if (a == 1):
print("A is a number 1")
elif(b == 3):
print ("B is 3")
else:
print("A isn't 1 and B isn't 3")
and then you run:
cat test.py | python tokens.py
Output:
Counter({'if': 1, 'elif': 1, 'else': 1})
Advantages
-
Only correct python (syntactically) will be parsed
-
You only will be counting the python keywords (not every if occurrence in the code text, for example, you can have an line like
a = "if inside str"
That if should not be counted I think