gensim

Can gensim Doc2Vec be used to compare a novel document to a trained model?

Can gensim Doc2Vec be used to compare a novel document to a trained model? Question: I have a set of documents that all fit a pre-defined category and have successfully trained a model off of those documents. The question is, if I have a novel document, how can I calculate how closely this new document …

Total answers: 2

Can't load saved gensim word2vec model

Can't load saved gensim word2vec model Question: I tried saving a word2vec model that I had trained with gensim like so: from gensim.models import Word2Vec model = Word2Vec(sentences, parameters) model.save(‘modelfile.model’) Now when I try Word2Vec.load(‘modelfile.model’), I get: ModuleNotFoundError: No module named ‘numpy.core._multiarray_umath’ Can post the full traceback if it helps. Asked By: snapcrack || Source …

Total answers: 3

Continue training a FastText model

Continue training a FastText model Question: I have downloaded a .bin FastText model, and I use it with gensim as follows: model = FastText.load_fasttext_format(“cc.fr.300.bin”) I would like to continue the training of the model to adapt it to my domain. After checking FastText’s Github and the Gensim documentation it seems like it is not currently …

Total answers: 3

PyTorch / Gensim – How do I load pre-trained word embeddings?

PyTorch / Gensim – How do I load pre-trained word embeddings? Question: I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer. How do I get the embedding weights loaded by gensim into the PyTorch embedding layer? Asked By: MBT || Source Answers: I think it is easy. Just copy …

Total answers: 6

How to generate word embeddings in Portuguese using Gensim?

How to generate word embeddings in Portuguese using Gensim? Question: I have the following problem: In English language my code generates successful word embeddings with Gensim, and similar phrases are close to each other considering cosine distance: The angle between “Response time and error measurement” and “Relation of user perceived response time to error measurement” …

Total answers: 1

Import GoogleNews-vectors-negative300.bin

Import GoogleNews-vectors-negative300.bin Question: I am working on code using the gensim and having a tough time troubleshooting a ValueError within my code. I finally was able to zip GoogleNews-vectors-negative300.bin.gz file so I could implement it in my model. I also tried gzip which the results were unsuccessful. The error in the code occurs in the …

Total answers: 6

Doc2Vec Get most similar documents

Doc2Vec Get most similar documents Question: I am trying to build a document retrieval model that returns most documents ordered by their relevancy with respect to a query or a search string. For this I trained a doc2vec model using the Doc2Vec model in gensim. My dataset is in the form of a pandas dataset …

Total answers: 1

Interpreting negative Word2Vec similarity from gensim

Interpreting negative Word2Vec similarity from gensim Question: E.g. we train a word2vec model using gensim: from gensim import corpora, models, similarities from gensim.models.word2vec import Word2Vec documents = [“Human machine interface for lab abc computer applications”, “A survey of user opinion of computer system response time”, “The EPS user interface management system”, “System and human system …

Total answers: 3

How to load a pre-trained Word2vec MODEL File and reuse it?

How to load a pre-trained Word2vec MODEL File and reuse it? Question: I want to use a pre-trained word2vec model, but I don’t know how to load it in python. This file is a MODEL file (703 MB). It can be downloaded here: http://devmount.github.io/GermanWordEmbeddings/ Asked By: Vahid SJ || Source Answers: just for loading import …

Total answers: 4

What meaning does the length of a Word2vec vector have?

What meaning does the length of a Word2vec vector have? Question: I am using Word2vec through gensim with Google’s pretrained vectors trained on Google News. I have noticed that the word vectors I can access by doing direct index lookups on the Word2Vec object are not unit vectors: >>> import numpy >>> from gensim.models import …

Total answers: 1