Python: How to determine if a string has an exact match with any string from the list


Assume that I have the list of phrases to compare against as:
["hello", "hi", "bye"]

I want to return true if my text has any of this words in it, but with exact match. Meaning that: hi there, how are you? returns true, but hithere, how are you? returns false.

So far I have the below code:

phrases = ['hello', 'hi', 'bye']    
def match(text: str) -> bool:
    if any(ext in text for ext in phrases):
        return True
        return False

But it returns true for both inputs.

I also found out about this below function which returns the exact matches from the string, but I want to compare against a list of phrases, not a single string. I know I can iterate through the list of words and check one by one, but hoping to find a solution that is better performing.

import re
print(re.findall('\bhello\b', "hellothere, how are you?"))

Update: By exact match, I mean word boundary. That can be space, punctuation, etc. Just like what b is

Asked By: Josh



One possible solution is to first split() the sentence into words, then strip() any punctuation marks and alike for each word and finally check if that word matches a word in the list. Actually you should not use a list but a Set which will enable lookups in constant (O(1)) time instead of linear (O(n)) time as is the case with lists.

phrases = ['hello', 'hi', 'bye']
phraseSet = set(phrases)

def match(text: str, word_set: set[str]) -> bool:
    words = text.split(" ")
    for word in words:
        stripped = word.strip(".?!,:")
        if stripped in word_set:
            return True
    return False

print(match("hi there, how are you?", phraseSet))
print(match("hithere, how are you?", phraseSet))

Obviously one could write the above solution in a more pythonic way.

Answered By: Mushroomator

A regex of the form r"(abc|ef|xxx)" will match with "abc", "ef", or "xxx". You can create this regex by using the string concatenation as below.
Note returns None if no match is found.

import re

phrases = ['hello', 'hi', 'bye']
def match(text):
  r ='b({})b'.format("|".join(phrases)), text)
  return r is not None

match("hi there, how are you?"), match("hithere, how are you?")
# (True, False)
Answered By: Kota Mori

Depending on your exact needs, you can tweak this, but I think this does what you need:

import re

phrases = ['hello', 'hi', 'bye']
text = "Hi there, how are you? How did that Hi8 turn out? Hi, can you hear me? Hello? Uh... Bye!"
expression = rf'(?:^|(?<=s))(?:{"|".join(phrases)})(?=[,.!?;:s]|$)'

result = re.findall(expression, text, flags=re.IGNORECASE)


['Hi', 'Hi', 'Hello', 'Bye']

About that regular expression:

  • (?:^|(?<=s)) says: in a non-capturing group ((?: )), check that there’s the start of the line, or the previous character is a space character.
  • (?:{"|".join(phrases)}) Since the expression is an f-string (and a raw string, rf'something') the part between {} gets replaced by evaluating the Python expression, so hello|hi|bye in this case. The expression will match any of the words, in a non-capturing group.
  • (?=[,.!?;:s]|$) and at the end, there’s a lookahead checking that the next character is either interpunction or a space, or the end of the string follows. (Note that the . needs to be escaped with a backslash for the regex engine, otherwise it would match "any character")
Answered By: Grismar