LDA Mallet Gensim CalledProcessError
Question:
Seems like many people are having issues with Mallet.
import os
from gensim.models.wrappers import LdaMallet
os.environ.update({'MALLET_HOME':r'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8'})
mallet_path = r'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet'
model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus,num_topics=num_topics, id2word=id2word)
Getting the following errors:
/bin/sh: C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet.bat: No such file or directory
CalledProcessError: Command 'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet.bat import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "S+" --input /var/folders/ml/lxzrtxwn02vbvq65c80g1b640000gn/T/c52cdc_corpus.txt --output /var/folders/ml/lxzrtxwn02vbvq65c80g1b640000gn/T/c52cdc_corpus.mallet' returned non-zero exit status 127.
I downloaded mallet from http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip and unzipped it in my directory. I’ve tried running the command in the error in the terminal and I’m getting the same ‘no such file found’ error, but it’s there in my directory?
I’ve also followed this: https://ps.au.dk/fileadmin/ingen_mappe_valgt/installing_mallet.pdf
When I go to the directory via command line and type ./bin/mallet
I get a whole bunch of commands, which according to the instructions, is what I’m looking for to know that it’s been installed ok.
I’m running the following on MacOS
- Python==3.9.6
- gensim==3.8.3
Anyone have any ideas?
Answers:
As silly as this sounds, I resolved this by changing the path to:
os.environ.update({'MALLET_HOME':r'mallet-2.0.8'})
mallet_path = r'mallet-2.0.8/bin/mallet'
So if you have the mallet directory in the same one as where your code is, this will work!
This, error arises if jdk is not installed in the system, lda mallet uses jdk to run . if your are using colab follow these steps
1.!pip install –upgrade gensim==3.8(wrapper classes only supported in the previous versions)
2.install jdk in colab
import os
def install_java():
!apt-get install -y openjdk-8-jdk-headless -qq > /dev/null #install openjdk
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64" #set environment variable
!java -version #check java version
install_java()
3.install the mallet
!wget http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
!unzip mallet-2.0.8.zip
4.set the path and run the lda mallet
os.environ[‘MALLET_HOME’] = ‘/content/mallet-2.0.8’
mallet_path = ‘/content/mallet-2.0.8/bin/mallet’ # you should NOT need to change this
Hope this helps.
Seems like many people are having issues with Mallet.
import os
from gensim.models.wrappers import LdaMallet
os.environ.update({'MALLET_HOME':r'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8'})
mallet_path = r'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet'
model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus,num_topics=num_topics, id2word=id2word)
Getting the following errors:
/bin/sh: C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet.bat: No such file or directory
CalledProcessError: Command 'C:/Users/myusername/Desktop/Topic_Modelling/mallet-2.0.8/bin/mallet.bat import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "S+" --input /var/folders/ml/lxzrtxwn02vbvq65c80g1b640000gn/T/c52cdc_corpus.txt --output /var/folders/ml/lxzrtxwn02vbvq65c80g1b640000gn/T/c52cdc_corpus.mallet' returned non-zero exit status 127.
I downloaded mallet from http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip and unzipped it in my directory. I’ve tried running the command in the error in the terminal and I’m getting the same ‘no such file found’ error, but it’s there in my directory?
I’ve also followed this: https://ps.au.dk/fileadmin/ingen_mappe_valgt/installing_mallet.pdf
When I go to the directory via command line and type ./bin/mallet
I get a whole bunch of commands, which according to the instructions, is what I’m looking for to know that it’s been installed ok.
I’m running the following on MacOS
- Python==3.9.6
- gensim==3.8.3
Anyone have any ideas?
As silly as this sounds, I resolved this by changing the path to:
os.environ.update({'MALLET_HOME':r'mallet-2.0.8'})
mallet_path = r'mallet-2.0.8/bin/mallet'
So if you have the mallet directory in the same one as where your code is, this will work!
This, error arises if jdk is not installed in the system, lda mallet uses jdk to run . if your are using colab follow these steps
1.!pip install –upgrade gensim==3.8(wrapper classes only supported in the previous versions)
2.install jdk in colab
import os
def install_java():
!apt-get install -y openjdk-8-jdk-headless -qq > /dev/null #install openjdk
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64" #set environment variable
!java -version #check java version
install_java()
3.install the mallet
!wget http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
!unzip mallet-2.0.8.zip
4.set the path and run the lda mallet
os.environ[‘MALLET_HOME’] = ‘/content/mallet-2.0.8’
mallet_path = ‘/content/mallet-2.0.8/bin/mallet’ # you should NOT need to change this
Hope this helps.