How to extract sentences of text that contain keywords in list

Question

I’m trying to return all sentences that contain ‘any’ words in a list, but the result only returns the sentence for the second word in the list. In the example below, I wanted to extract the sentence that contained inflation and commodity, not just commodity. Any help would be appreciated.

text = 'inflation is very high. commodity prices are rising a lot. this is an extra sentence'
words = ['inflation', 'commodity']
for word in words:
    [words.casefold() for words in words] #to ignore cases in text

def extract_word(text):

    return [sentence for sentence in text.split('.') if word in sentence]

extract_word(text)

[' commodity prices are rising a lot']

Asked By: slowgoyo

||

Source

Answer 1

The condition if word in sentence will check if the iterator word from the for loop is in sentence. Since "commodity" is the last element in the list words, after the for loop, word will contain the string "commodity".

Instead, in the list comprehension statement, you can check if any of the elements in words is in sentence, such as below:

text = 'inflation is very high. commodity prices are rising a lot. this is an extra sentence'
words = ['inflation', 'commodity']
sentences = [
    sentence for sentence in text.split(".") if any(
        w.lower() in sentence.lower() for w in words
    )
]

print(sentences)
# >>> ['inflation is very high', ' commodity prices are rising a lot']

Answered By: Jacob Lee

Answer 2

You can try this:

texts = ' commodity prices are rising a lot. some random text. this text contains the word: inflation'
words = ['inflation','commodity']
lst_words = [words.casefold() for words in words] #to ignore cases in text

def found_word(sentence, lst_words):
    return any(word in lst_words for word in sentence.split())

def extract_word(text):
    lst_sentences = []
    for sentence in text.split('.'):
        if found_word(sentence, lst_words):
            lst_sentences.append([sentence + '.'])
    return lst_sentences

extract_word(texts)
# [[' commodity prices are rising a lot.'],
#  [' this text contains the word: inflation.']]

It is a bit longer, but I think much better to read.

Answered By: Andreas

Answer 3

I find generators to be pretty handy in these kinds of cases.

def extract_word(text):
    words = ['inflation', 'commodity']
    sentences = text.split('.')
    for sentence in sentences:
        if any(word in sentence for word in words):
            yield sentence

>>> list(extract_word('inflation is very high. commodity prices are rising a lot. this is an extra sentence'))
['inflation is very high', ' commodity prices are rising a lot']

It’s readable and easy to understand what the outcome is.

Answered By: miksus

How to extract sentences of text that contain keywords in list

Question:

Answers: