How to check if a key exists in a word2vec trained model or not

Question:

I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say “view”.

myModel["view"]

However, I get a KeyError for the word which is probably because this doesn’t exist as a key in the list of keys indexed by word2vec. How can I check if a key exits in the index before trying to get the raw feature vector?

Asked By: London guy

||

Answers:

Answering my own question here.

Word2Vec provides a method named contains(‘view’) which returns True or False based on whether the corresponding word has been indexed or not.

Answered By: London guy

Word2Vec also provides a ‘vocab’ member, which you can access directly.

Using a pythonistic approach:

if word in w2v_model.vocab:
    # Do something

EDIT Since gensim release 2.0, the API for Word2Vec changed. To access the vocabulary you should now use this:

if word in w2v_model.wv.vocab:
    # Do something

EDIT 2 The attribute ‘wv’ is being deprecated and will be completed removed in gensim 4.0.0. Now it’s back to the original answer by OP:

if word in w2v_model.vocab:
    # Do something
Answered By: Matt Fortier

Hey i know am getting late this post, but here is a piece of code that can handle this issue well. I myself using it in my code and it works like a charm 🙂

   size = 300 #word vector size
   word = 'food' #word token

   try:
        wordVector = model[word].reshape((1, size))
   except KeyError:
        print "not found! ",  word

NOTE:
I am using python Gensim Library for word2vec models

Answered By: Nomiluks

convert the model into vectors with

word_vectors = model.wv

then we can use

if 'word' in word_vectors.vocab
Answered By: rakaT

I generally use a filter:

for doc in labeled_corpus:
    words = filter(lambda x: x in model.vocab, doc.words)

This is one simple method for getting past the KeyError on unseen words.

Answered By: Prakhar Agarwal
Answered By: quemeful

as @quemeful has mentioned, you could do something like,

if "view" in model.wv.key_to_index.keys():
    # do something
Answered By: Ishara Dissanayake

to check if the word is exist in your model you can use

word2vec_pretrained_dict = dict(zip(w2v_model.key_to_index.keys(), w2v_model.vectors))

where w2v_model.key_to_index give you dictionary of each word and sequance number

and w2v_model.vectors return the vectorized for of each word

Answered By: Faten Elhariry
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.