ValueError('[E999] Unable to merge the Doc objects because they do not all share the same `Vocab`.')
Question:
I want to combine the results of two different spacy.Languages but will receive the following error:
ValueError(‘[E999] Unable to merge the Doc objects because they do not
all share the same Vocab
.’)
Example Code:
import spacy
from spacy.tokens import Doc
nlp_1 = spacy.blank("en")
ruler = nlp_1.add_pipe("entity_ruler")
ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}, ])
doc_1 = nlp_1('Apple')
nlp_2 = spacy.blank("en")
ruler = nlp_2.add_pipe("entity_ruler")
ruler.add_patterns([{"label": "NAM", "pattern": "Peter"}, ])
doc_2 = nlp_2('Peter')
print(Doc.from_docs([doc_1, doc_2]))
# ValueError: [E999] Unable to merge the Doc objects because they do not all share the same `Vocab`.
Question:
How do I fix this, e.g. share the vocabs between both nlp objects?
Why would I want that?
Lets say I want to analyse a mail. It is one document, but likelyhood that a number in the adress field is postal code is much higher than in the footer where it is probably a phone number. Therefore depending on the field I want to apply different "Languages" but which share the same vocab and then combine them into one doc for the mail.
Answers:
Provide the existing vocab from the first model for any subsequent models when they’re loaded:
nlp_2 = spacy.blank("en", vocab=nlp_1.vocab)
I want to combine the results of two different spacy.Languages but will receive the following error:
ValueError(‘[E999] Unable to merge the Doc objects because they do not
all share the sameVocab
.’)
Example Code:
import spacy
from spacy.tokens import Doc
nlp_1 = spacy.blank("en")
ruler = nlp_1.add_pipe("entity_ruler")
ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}, ])
doc_1 = nlp_1('Apple')
nlp_2 = spacy.blank("en")
ruler = nlp_2.add_pipe("entity_ruler")
ruler.add_patterns([{"label": "NAM", "pattern": "Peter"}, ])
doc_2 = nlp_2('Peter')
print(Doc.from_docs([doc_1, doc_2]))
# ValueError: [E999] Unable to merge the Doc objects because they do not all share the same `Vocab`.
Question:
How do I fix this, e.g. share the vocabs between both nlp objects?
Why would I want that?
Lets say I want to analyse a mail. It is one document, but likelyhood that a number in the adress field is postal code is much higher than in the footer where it is probably a phone number. Therefore depending on the field I want to apply different "Languages" but which share the same vocab and then combine them into one doc for the mail.
Provide the existing vocab from the first model for any subsequent models when they’re loaded:
nlp_2 = spacy.blank("en", vocab=nlp_1.vocab)