ValueError('[E999] Unable to merge the Doc objects because they do not all share the same `Vocab`.')

Question:

I want to combine the results of two different spacy.Languages but will receive the following error:

ValueError(‘[E999] Unable to merge the Doc objects because they do not
all share the same Vocab.’)

Example Code:

import spacy
from spacy.tokens import Doc

nlp_1 = spacy.blank("en")
ruler = nlp_1.add_pipe("entity_ruler")
ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}, ])
doc_1 = nlp_1('Apple')

nlp_2 = spacy.blank("en")
ruler = nlp_2.add_pipe("entity_ruler")
ruler.add_patterns([{"label": "NAM", "pattern": "Peter"}, ])
doc_2 = nlp_2('Peter')

print(Doc.from_docs([doc_1, doc_2]))
# ValueError: [E999] Unable to merge the Doc objects because they do not all share the same `Vocab`.

Question:
How do I fix this, e.g. share the vocabs between both nlp objects?

Why would I want that?
Lets say I want to analyse a mail. It is one document, but likelyhood that a number in the adress field is postal code is much higher than in the footer where it is probably a phone number. Therefore depending on the field I want to apply different "Languages" but which share the same vocab and then combine them into one doc for the mail.

Asked By: Andreas

||

Answers:

Provide the existing vocab from the first model for any subsequent models when they’re loaded:

nlp_2 = spacy.blank("en", vocab=nlp_1.vocab)
Answered By: aab
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.