Extract proper nouns and corresponding sentences from a dataset using python

Question:

I have a dataset containing list of sentences that have both proper nouns and common nouns in them. Example –

  1. Google is a website
  2. the universe is expanding constantly
  3. I wish I had bagels for bReakfasT
  4. the GUITAR sounded a bit off-key
  5. The rumors of Moira Rose’s death are greatly exaggerated online
  6. Mahatma Gandhi was a national treasure
  7. i strongly believe that beyonce is overrated

The casing of the dataset can also be mixed.

I want to extract all the proper nouns AND the corresponding sentences where they appear in two separate columns –

Output example

Is there any way to do this in Python? I am quite new to concepts of NLP and Python overall. Thanks!

Asked By: Prat1

||

Answers:

you can try with any languauge model like , spacy or nltk as mention by @ivanp

I have just used spacy model ,

 import spacy
 import string
 nlp = spacy.load("en_core_web_sm") # load pretrained model 

 def proper_noun_extraction(x):
     prop_noun = []
     doc = nlp(string.capwords(x))
     for tok in doc:
         if tok.pos_ == 'PROPN':
            prop_noun.append(str(tok))
     if len(prop_noun) !=0:
        return (' '.join(prop_noun), x)
     else:
        return ('no proper noun found', None)

tuple_noun_sent = df['sentence'].apply(lambda x:proper_noun_extraction(x))

resultant_df = pd.DataFrame(tuple_noun_sent.tolist(), columns = ['proper_noun', 'sentence'])

enter image description here

Answered By: qaiser
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.