Custom NERs training with spaCy 3 throws ValueError

Question:

I am trying to add custom NER labels using spacy 3. I found tutorials for older versions and made adjustments for spacy 3. Here is the whole code I am using:

import random
import spacy
from spacy.training import Example

LABEL = 'ANIMAL'
TRAIN_DATA = [
    ("Horses are too tall and they pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
    ("Do they bite?", {'entities': []}),
    ("horses are too tall and they pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
    ("horses pretend to care about your feelings", {'entities': [(0, 6, LABEL)]}),
    ("they pretend to care about your feelings, those horses", {'entities': [(48, 54, LABEL)]}),
    ("horses?", {'entities': [(0, 6, LABEL)]})
]
nlp = spacy.load('en_core_web_sm')  # load existing spaCy model
ner = nlp.get_pipe('ner')
ner.add_label(LABEL)
print(ner.move_names) # Here I see, that the new label was added
optimizer = nlp.create_optimizer()
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes):  # only train NER
    for itn in range(20):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotations in TRAIN_DATA:
            doc = nlp(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
        print(losses)
# test the trained model # add some dummy sentences with many NERs

test_text = 'Do you like horses?'
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
    print(ent.label_, " -- ", ent.text)

This code outputs the ValueError exception, but only after 2 iterations – notice the first 2 lines:

{'ner': 9.862242701536594}
{'ner': 8.169456698315201}
Traceback (most recent call last):
  File ".custom_ner_training.py", line 46, in <module>
    nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
  File "C:ogrmojepythonspacy_pgmyvenvlibsite-packagesspacylanguage.py", line 1106, in update
    proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
  File "spacypipelinetransition_parser.pyx", line 366, in spacy.pipeline.transition_parser.Parser.update
  File "spacypipelinetransition_parser.pyx", line 478, in spacy.pipeline.transition_parser.Parser.get_batch_loss
  File "spacypipeline_parser_internalsner.pyx", line 310, in spacy.pipeline._parser_internals.ner.BiluoPushDown.set_costs
ValueError

I see the ANIMAL label was added by calling ner.move_names.

When I change my the value LABEL = 'PERSON, the code runs successfully and recognizes horses as PERSON on the new data. This is why I am assuming, there is no error in the code itself.

Is there something I am missing? What am I doing wrong? Could someone reproduce, please?

NOTE: This is my first question ever here. I hope I provided all information. If not, let me know in the comments.

Asked By: krisograbek

||

Answers:

You need to change the following line in the for loop

doc = nlp(text)

to

doc = nlp.make_doc(text)

The code should work and produce the following results:

{'ner': 9.60289144264557}
{'ner': 8.875474230820478}
{'ner': 6.370401408220459}
{'ner': 6.687456469517201}
... 
{'ner': 1.3796682589133492e-05}
{'ner': 1.7709562613218738e-05}

Entities in 'Do you like horses?'
ANIMAL  --  horses
Answered By: Thang Pham

One more potential reason could be the misaligned label info in the corpus. You can check if there are extra spaces in the training data. If you, so can first remove extra spaces from the text and calculate the start and end positions of the label within the text.

Answered By: Devendra Soni