Extract proper nouns and corresponding sentences from a dataset using python

Question

I have a dataset containing list of sentences that have both proper nouns and common nouns in them. Example –

Google is a website
the universe is expanding constantly
I wish I had bagels for bReakfasT
the GUITAR sounded a bit off-key
The rumors of Moira Rose’s death are greatly exaggerated online
Mahatma Gandhi was a national treasure
i strongly believe that beyonce is overrated

The casing of the dataset can also be mixed.

I want to extract all the proper nouns AND the corresponding sentences where they appear in two separate columns –

Output example

Is there any way to do this in Python? I am quite new to concepts of NLP and Python overall. Thanks!

Asked By: Prat1

||

Source

Answer 1

you can try with any languauge model like , spacy or nltk as mention by @ivanp

I have just used spacy model ,

 import spacy
 import string
 nlp = spacy.load("en_core_web_sm") # load pretrained model 

 def proper_noun_extraction(x):
     prop_noun = []
     doc = nlp(string.capwords(x))
     for tok in doc:
         if tok.pos_ == 'PROPN':
            prop_noun.append(str(tok))
     if len(prop_noun) !=0:
        return (' '.join(prop_noun), x)
     else:
        return ('no proper noun found', None)

tuple_noun_sent = df['sentence'].apply(lambda x:proper_noun_extraction(x))

resultant_df = pd.DataFrame(tuple_noun_sent.tolist(), columns = ['proper_noun', 'sentence'])

Answered By: qaiser

Extract proper nouns and corresponding sentences from a dataset using python

Question:

Answers: