How do you detect a key word in a sentence no matter the tense, form in python?

Question:

I am trying to use spaCy in Python to detect the word "grief" no matter the form, whether it is "I am grieving", "going through grief.""I grieved over __", if it’s in all caps, etc. I am pretty new to python so I don’t know lemmatization that well, but is there some simple if statements that could solve it using spaCy?

grief = str(input(("What is currently on your mind? ")))
doc = nlp(grief)
if [t.grief for t in doc if t.lemma_ == "grie"]:
    grief1(sad_value)
Asked By: Alvino123

||

Answers:

To make use of the spaCy lemmatiser, you need to check for two lemmas: "grief" and "grieve". However, this doesn’t catch all cases as one might initially expect (see below).

In general, one should not always assume that the spaCy lemma output will be lowercase, nor assume that random capitalisation of letters in a given input word does not influence the result. For example,

  • "Grief is what I feel" outputs "Grief" for the lemma instead of "grief".
  • "I am gRieviNG" outputs "grieving" instead of the expected (correct) lemma "grieve" (which is what one gets if "I am grieving" is capitalised normally).

This Medium article by Jade Moillic highlights the lowercase limitations of the spaCy lemmatiser quite well.

To handle these situations, one can force the lemma output to be lowercase, and then also add "grieving" as a possible lemma to check. Alternatively, Stemming via the SnowballStemmer implementation provides a robust option. Solutions are as follows.

spaCy Lemmatiser-based Solution

import spacy

nlp = spacy.load('en_core_web_sm', exclude=["ner"])
grief = str(input(("What is currently on your mind? ")))
# Input: "I am grieving"
doc = nlp(grief)
for t in doc:
    lem = t.lemma_.lower()
    if lem == "grief" or lem == "grieve" or lem == "grieving":
        print("Found {}".format(lem))
# Output: "Found grieve"

Examples for testing with spaCy Lemmatiser

import spacy

nlp = spacy.load('en_core_web_sm', exclude=["ner"])
texts = ["Grief is what I feel", "Grieving is not something I'm used to", "I am grieving", "Going through grief", "I will grieve", "I grieved", "He grieves", "I am gRieviNG"]
docs = list(nlp.pipe(texts))
for doc in docs:
    print(doc.text)
    for t in doc:
        lem = t.lemma_.lower()
        if lem == "grief" or lem == "grieve" or lem == "grieving":
            print("t-> Found {}".format(lem))

# Output
# Grief is what I feel
#         -> Found grief
# Grieving is not something I'm used to
#         -> Found grieve
# I am grieving
#         -> Found grieve
# Going through grief
#         -> Found grief
# I will grieve
#         -> Found grieve
# I grieved
#         -> Found grieve
# He grieves
#         -> Found grieve
# I am gRieviNG
#         -> Found grieving

Stemming-based Solution

from nltk.stem.snowball import SnowballStemmer
from nltk.tokenize import word_tokenize

stemmer = SnowballStemmer(language='english')
grief = str(input(("What is currently on your mind? ")))
for token in word_tokenize(grief):
    stem = stemmer.stem(token)
    if stem == 'grief' or stem == 'griev':
        print("Found {}".format(stem))
Answered By: Kyle F Hartzenberg
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.