How to identify body part names in a text with python

Question:

I am trying to specify whether an entity is a body part. For example in "Other specified disorders of the right ear," I want to be able to identify the right ear as an entity. I tried some named entity recognition methods but they identify all entities, not just the body parts. I tried using scispacy to do so but I have not managed so far. I tried concise_concepts from spacy to create a separate entity for body parts but that didn’t work either. Please guide me through how I can do that and a snippet code would be appreciated.

Asked By: Moein Hasani

||

Answers:

So this code based on this asnwer almost detects everything correctly but it needs lemmatization.

from nltk.corpus import wordnet as wn
import nltk 
nltk.download('wordnet')
part = wn.synsets('body_part')[0]

def is_body_part(candidate):
    for ss in wn.synsets(candidate):
        # only get those where the synset matches exactly
        name = ss.name().split(".", 1)[0]
        if name != candidate:
            continue
        hit = part.lowest_common_hypernyms(ss)
        if hit and hit[0] == part:
            return True
    return False

# true things
for word in ("finger", "hand", "head", "feet", 'foot', 'hair'):
    print(is_body_part(word), word, sep="t")

# false things
for word in ("cat", "dog", "fish", "cabbage", "knife"):
    print(is_body_part(word), word, sep="t")

Output:

True    finger
True    hand
True    head
False   feet # have to lemmatize it to foot
True    foot
True    hair
False   cat
False   dog
False   fish
False   cabbage
False   knife
Answered By: Moein Hasani