How to check if a key exists in a word2vec trained model or not
Question:
I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say “view”.
myModel["view"]
However, I get a KeyError for the word which is probably because this doesn’t exist as a key in the list of keys indexed by word2vec. How can I check if a key exits in the index before trying to get the raw feature vector?
Answers:
Answering my own question here.
Word2Vec provides a method named contains(‘view’) which returns True or False based on whether the corresponding word has been indexed or not.
Word2Vec also provides a ‘vocab’ member, which you can access directly.
Using a pythonistic approach:
if word in w2v_model.vocab:
# Do something
EDIT Since gensim release 2.0, the API for Word2Vec changed. To access the vocabulary you should now use this:
if word in w2v_model.wv.vocab:
# Do something
EDIT 2 The attribute ‘wv’ is being deprecated and will be completed removed in gensim 4.0.0. Now it’s back to the original answer by OP:
if word in w2v_model.vocab:
# Do something
Hey i know am getting late this post, but here is a piece of code that can handle this issue well. I myself using it in my code and it works like a charm 🙂
size = 300 #word vector size
word = 'food' #word token
try:
wordVector = model[word].reshape((1, size))
except KeyError:
print "not found! ", word
NOTE:
I am using python Gensim Library for word2vec models
convert the model into vectors with
word_vectors = model.wv
then we can use
if 'word' in word_vectors.vocab
I generally use a filter:
for doc in labeled_corpus:
words = filter(lambda x: x in model.vocab, doc.words)
This is one simple method for getting past the KeyError on unseen words.
The vocab attribute was removed from KeyedVector in Gensim 4.0.0. Try using this:
if 'word' in model.wv.key_to_index:
# do something
as @quemeful has mentioned, you could do something like,
if "view" in model.wv.key_to_index.keys():
# do something
to check if the word is exist in your model you can use
word2vec_pretrained_dict = dict(zip(w2v_model.key_to_index.keys(), w2v_model.vectors))
where w2v_model.key_to_index
give you dictionary of each word and sequance number
and w2v_model.vectors
return the vectorized for of each word
I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say “view”.
myModel["view"]
However, I get a KeyError for the word which is probably because this doesn’t exist as a key in the list of keys indexed by word2vec. How can I check if a key exits in the index before trying to get the raw feature vector?
Answering my own question here.
Word2Vec provides a method named contains(‘view’) which returns True or False based on whether the corresponding word has been indexed or not.
Word2Vec also provides a ‘vocab’ member, which you can access directly.
Using a pythonistic approach:
if word in w2v_model.vocab:
# Do something
EDIT Since gensim release 2.0, the API for Word2Vec changed. To access the vocabulary you should now use this:
if word in w2v_model.wv.vocab:
# Do something
EDIT 2 The attribute ‘wv’ is being deprecated and will be completed removed in gensim 4.0.0. Now it’s back to the original answer by OP:
if word in w2v_model.vocab:
# Do something
Hey i know am getting late this post, but here is a piece of code that can handle this issue well. I myself using it in my code and it works like a charm 🙂
size = 300 #word vector size
word = 'food' #word token
try:
wordVector = model[word].reshape((1, size))
except KeyError:
print "not found! ", word
NOTE:
I am using python Gensim Library for word2vec models
convert the model into vectors with
word_vectors = model.wv
then we can use
if 'word' in word_vectors.vocab
I generally use a filter:
for doc in labeled_corpus:
words = filter(lambda x: x in model.vocab, doc.words)
This is one simple method for getting past the KeyError on unseen words.
The vocab attribute was removed from KeyedVector in Gensim 4.0.0. Try using this:
if 'word' in model.wv.key_to_index:
# do something
as @quemeful has mentioned, you could do something like,
if "view" in model.wv.key_to_index.keys():
# do something
to check if the word is exist in your model you can use
word2vec_pretrained_dict = dict(zip(w2v_model.key_to_index.keys(), w2v_model.vectors))
where w2v_model.key_to_index
give you dictionary of each word and sequance number
and w2v_model.vectors
return the vectorized for of each word