spacy

Spacy, how not to remove "not" when cleaning the text with space

Spacy, how not to remove "not" when cleaning the text with space Question: I use this spacy code to later apply it on my text, but i need the negative words to stay in the text like "not". nlp = spacy.load("en_core_web_sm") def my_tokenizer(sentence): return [token.lemma_ for token in tqdm(nlp(sentence.lower()), leave = False) if token.is_stop == …

Total answers: 1

How to identify body part names in a text with python

How to identify body part names in a text with python Question: I am trying to specify whether an entity is a body part. For example in "Other specified disorders of the right ear," I want to be able to identify the right ear as an entity. I tried some named entity recognition methods but …

Total answers: 1

How to create a list of tokenized words from dataframe column using spaCy?

How to create a list of tokenized words from dataframe column using spaCy? Question: I’m trying to apply spaCys tokenizer on dataframe column to get a new column containing list of tokens. Assume we have the following dataframe: import pandas as pd details = { ‘Text_id’ : [23, 21, 22, 21], ‘Text’ : [‘All roads …

Total answers: 2

How to get all stop words from Spacy and don't get any errors? TypeError: argument of type 'module' is not iterable

How to get all stop words from Spacy and don't get any errors? TypeError: argument of type 'module' is not iterable Question: How to get all stop words from spacy.lang.en and don’t get any errors? from spacy.lang.en import stop_words as stop_words def tokenize(sentence): sentence = nlp(sentence) # lemmatizing sentence = [ word.lemma_.lower().strip() if word.lemma_ != …

Total answers: 1

Extracting start and end indices of a token using spacy

Extracting start and end indices of a token using spacy Question: I am looking at lots of sentences and looking to extract the start and end indices of a word in a given sentence. For example, the input is as follows: "This is a sentence written in English by a native English speaker." And What …

Total answers: 3

Error while loading vector from Glove in Spacy

Error while loading vector from Glove in Spacy Question: I am facing the following attribute error when loading glove model: Code used to load model: nlp = spacy.load(‘en_core_web_sm’) tokenizer = spacy.load(‘en_core_web_sm’, disable=[‘tagger’,’parser’, ‘ner’, ‘textcat’]) nlp.vocab.vectors.from_glove(‘../models/GloVe’) Getting the following atribute error when trying to load the glove model: AttributeError: ‘spacy.vectors.Vectors’ object has no attribute ‘from_glove’ Have …

Total answers: 2

Save and load nlp results in spacy

Save and load nlp results in spacy Question: I want to use SpaCy to analyze many small texts and I want to store the nlp results for further use to save processing time. I found code at Storing and Loading spaCy Documents Containing Word Vectors but I get an error and I cannot find how …

Total answers: 3

Make Spacy tokenizer not split on /

Make Spacy tokenizer not split on / Question: How do I modify the English tokenizer to prevent splitting tokens on the ‘/’ character? For example, the following string should be one token: import spacy nlp = spacy.load(‘en_core_web_md’) doc = nlp("12/AB/568793") for t in doc: print(f"[{t.pos_} {t.text}]") # produces #[NUM 12] #[SYM /] #[ADJ AB/568793] Asked …

Total answers: 2

Using spacy to redact names from a column in a data frame

Using spacy to redact names from a column in a data frame Question: I have a data frame named "df1". This data frame has 12 columns. The last column in this data frame is called notes. I need to replace common names like "john, sally and richard" from this column and replace the values with …

Total answers: 2