Topic modeling on short texts Python

Question:

I want to do topic modeling on short texts. I did some research on LDA and found that it doesn’t go well with short texts. What methods would be better and do they have Python implementations?

Asked By: Sri Test

||

Answers:

You can try Short Text Topic Modelling (refer to this https://www.groundai.com/project/sttm-a-tool-for-short-text-topic-modeling/1) (code available at https://github.com/qiang2100/STTM) . It combine state-of-the-art algorithms and traditional topics modelling for long text which can conveniently be used for short text.

For more specialised libraries, try lda2vec-tf, which combines word vectors with LDA topic vectors. It is branched from the original lda2vec and improved upon and gives better results than the original library.

Answered By: red_mouse_coder

The only Python implementation of short text topic modeling is GSDMM. Unfortunately, most of the others are written on Java.

Answered By: Ilya Palachev

Besides GSDM, there is also biterm implemented in python for short text topic modeling.

Answered By: chefhose

Here’s a very fast and easy to use implementation of GSDMM that can be used in Python: https://github.com/centre-for-humanities-computing/tweetopic

Answered By: Márton Kardos