topic-modeling

Unable to install top2vec

Unable to install top2vec Question: I’m trying to install top2vec for my topic analysis project. To install, I’ve used this command pip install top2vec. The installation process starts normally but it ends with this error: Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status ‘error’ error: subprocess-exited-with-error python setup.py egg_info did not run successfully. …

Total answers: 1

Iterate function across dataframe

Iterate function across dataframe Question: I have a dataset containing pre-processed online reviews, each row contains words from online review. I am doing a Latent Dirichlet Allocation process to extract topics from the entire dataframe. Now, I want to assign topics to each row of data based on an LDA function called get_document_topics. I found …

Total answers: 2

init() got an unexpected keyword argument 'cachedir' when importing top2vec

__init__() got an unexpected keyword argument 'cachedir' when importing top2vec Question: I keep getting this error when importing top2vec. TypeError Traceback (most recent call last) Cell In [1], line 1 —-> 1 from top2vec import Top2Vec File ~AppDataRoamingPythonPython39site-packagestop2vec__init__.py:1 —-> 1 from top2vec.Top2Vec import Top2Vec 3 __version__ = ‘1.0.27’ File ~AppDataRoamingPythonPython39site-packagestop2vecTop2Vec.py:12 10 from gensim.models.phrases import Phrases …

Total answers: 2

Removal of Stop Words and Stemming/Lemmatization for BERTopic

Removal of Stop Words and Stemming/Lemmatization for BERTopic Question: For Topic Modelling, I’m trying out the BERTopic: Link I’m little confused here, I am trying out the BERTopic on my custom Dataset. Since BERT was trained in such a way that it holds the semantic meaning of the text/document, Should I be removing the stop …

Total answers: 3

Topic modeling on short texts Python

Topic modeling on short texts Python Question: I want to do topic modeling on short texts. I did some research on LDA and found that it doesn’t go well with short texts. What methods would be better and do they have Python implementations? Asked By: Sri Test || Source Answers: You can try Short Text …

Total answers: 4

A practical example of GSDMM in python?

A practical example of GSDMM in python? Question: I want to use GSDMM to assign topics to some tweets in my data set. The only examples I found (1 and 2) are not detailed enough. I was wondering if you know of a source (or care enough to make a small example) that shows how …

Total answers: 4

Ways of obtaining a similarity metric between two full text documents?

Ways of obtaining a similarity metric between two full text documents? Question: So imagine I have three text documents, for example (let 3 randomly generated texts). Document 1: “Whole every miles as tiled at seven or. Wished he entire esteem mr oh by. Possible bed you pleasure civility boy elegance ham. He prevent request by …

Total answers: 2

Latent Dirichlet Allocation with prior topic words

Latent Dirichlet Allocation with prior topic words Question: Context I’m trying to extract topics from a set of texts using Latent Dirichlet allocation from Scikit-Learn’s decomposition module. This works really well, except for the quality of topic words found/selected. In a article by Li et al (2017), the authors describe using prior topic words as …

Total answers: 3

Understanding LDA implementation using gensim

Understanding LDA implementation using gensim Question: I am trying to understand how gensim package in Python implements Latent Dirichlet Allocation. I am doing the following: Define the dataset documents = [“Apple is releasing a new product”, “Amazon sells many things”, “Microsoft announces Nokia acquisition”] After removing stopwords, I create the dictionary and the corpus: texts …

Total answers: 5