How to extract sentences of text that contain keywords in list

Question:

I’m trying to return all sentences that contain ‘any’ words in a list, but the result only returns the sentence for the second word in the list. In the example below, I wanted to extract the sentence that contained inflation and commodity, not just commodity. Any help would be appreciated.

text = 'inflation is very high. commodity prices are rising a lot. this is an extra sentence'
words = ['inflation', 'commodity']
for word in words:
    [words.casefold() for words in words] #to ignore cases in text

def extract_word(text):

    return [sentence for sentence in text.split('.') if word in sentence]

extract_word(text)

[' commodity prices are rising a lot']
Asked By: slowgoyo

||

Answers:

The condition if word in sentence will check if the iterator word from the for loop is in sentence. Since "commodity" is the last element in the list words, after the for loop, word will contain the string "commodity".

Instead, in the list comprehension statement, you can check if any of the elements in words is in sentence, such as below:

text = 'inflation is very high. commodity prices are rising a lot. this is an extra sentence'
words = ['inflation', 'commodity']
sentences = [
    sentence for sentence in text.split(".") if any(
        w.lower() in sentence.lower() for w in words
    )
]

print(sentences)
# >>> ['inflation is very high', ' commodity prices are rising a lot']
Answered By: Jacob Lee

You can try this:

texts = ' commodity prices are rising a lot. some random text. this text contains the word: inflation'
words = ['inflation','commodity']
lst_words = [words.casefold() for words in words] #to ignore cases in text

def found_word(sentence, lst_words):
    return any(word in lst_words for word in sentence.split())

def extract_word(text):
    lst_sentences = []
    for sentence in text.split('.'):
        if found_word(sentence, lst_words):
            lst_sentences.append([sentence + '.'])
    return lst_sentences

extract_word(texts)
# [[' commodity prices are rising a lot.'],
#  [' this text contains the word: inflation.']]

It is a bit longer, but I think much better to read.

Answered By: Andreas

I find generators to be pretty handy in these kinds of cases.

def extract_word(text):
    words = ['inflation', 'commodity']
    sentences = text.split('.')
    for sentence in sentences:
        if any(word in sentence for word in words):
            yield sentence

>>> list(extract_word('inflation is very high. commodity prices are rising a lot. this is an extra sentence'))
['inflation is very high', ' commodity prices are rising a lot']

It’s readable and easy to understand what the outcome is.

Answered By: miksus
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.