Hashingvectorizer and Multinomial naive bayes are not working together
Question:
I am trying to write a twitter sentiment analysis program with Scikit-learn
in python 2.7. OS is Linux Ubuntu 14.04.
In Vectorizing step, I want to use Hashingvectorizer()
. To test the classifier accuracy it works fine with LinearSVC
, NuSVC
, GaussianNB
, BernoulliNB
and LogisticRegression
classifiers, but for MultinomialNB
, it returns this error
Traceback (most recent call last):
File "/media/test.py", line 310, in <module>
classifier_rbf.fit(train_vectors, y_trainTweets)
File "/home/.local/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 552, in fit
self._count(X, Y)
File "/home/.local/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 655, in _count
raise ValueError("Input X must be non-negative")
ValueError: Input X must be non-negative
[Finished in 16.4s with exit code 1]
Here is the block code related to this error
vectorizer = HashingVectorizer()
train_vectors = vectorizer.fit_transform(x_trainTweets)
test_vectors = vectorizer.transform(x_testTweets)
classifier_rbf = MultinomialNB()
classifier_rbf.fit(train_vectors, y_trainTweets)
prediction_rbf = classifier_rbf.predict(test_vectors)
Why it is happening and how can I solve it?
Answers:
You need to set non_negative
argument to True
, when initialising your vectorizer
vectorizer = HashingVectorizer(non_negative=True)
If the non_negative
argument isn’t available (just like my version)
Try putting :
vectorizer = HashingVectorizer(alternate_sign=False)
non_negative argument have been replaced with alternate_sign. so if u want non_negative=True
, try putting alternate_sign=False
.It will work surely.
I am trying to write a twitter sentiment analysis program with Scikit-learn
in python 2.7. OS is Linux Ubuntu 14.04.
In Vectorizing step, I want to use Hashingvectorizer()
. To test the classifier accuracy it works fine with LinearSVC
, NuSVC
, GaussianNB
, BernoulliNB
and LogisticRegression
classifiers, but for MultinomialNB
, it returns this error
Traceback (most recent call last):
File "/media/test.py", line 310, in <module>
classifier_rbf.fit(train_vectors, y_trainTweets)
File "/home/.local/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 552, in fit
self._count(X, Y)
File "/home/.local/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 655, in _count
raise ValueError("Input X must be non-negative")
ValueError: Input X must be non-negative
[Finished in 16.4s with exit code 1]
Here is the block code related to this error
vectorizer = HashingVectorizer()
train_vectors = vectorizer.fit_transform(x_trainTweets)
test_vectors = vectorizer.transform(x_testTweets)
classifier_rbf = MultinomialNB()
classifier_rbf.fit(train_vectors, y_trainTweets)
prediction_rbf = classifier_rbf.predict(test_vectors)
Why it is happening and how can I solve it?
You need to set non_negative
argument to True
, when initialising your vectorizer
vectorizer = HashingVectorizer(non_negative=True)
If the non_negative
argument isn’t available (just like my version)
Try putting :
vectorizer = HashingVectorizer(alternate_sign=False)
non_negative argument have been replaced with alternate_sign. so if u want non_negative=True
, try putting alternate_sign=False
.It will work surely.