Python pickle error: UnicodeDecodeError
Question:
I’m trying to do some text classification using Textblob. I’m first training the model and serializing it using pickle as shown below.
import pickle
from textblob.classifiers import NaiveBayesClassifier
with open('sample.csv', 'r') as fp:
cl = NaiveBayesClassifier(fp, format="csv")
f = open('sample_classifier.pickle', 'wb')
pickle.dump(cl, f)
f.close()
And when I try to run this file:
import pickle
f = open('sample_classifier.pickle', encoding="utf8")
cl = pickle.load(f)
f.close()
I get this error:
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x80 in position
0: invalid start byte
Following are the content of my sample.csv:
My SQL is not working correctly at all. This was a wrong choice, SQL
I’ve issues. Please respond immediately, Support
Where am I going wrong here? Please help.
Answers:
By choosing to open
the file in mode wb
, you are choosing to write in raw binary. There is no character encoding being applied.
Thus to read this file, you should simply open
in mode rb
.
I think you should open the file as
f = open('sample_classifier.pickle', 'rb')
cl = pickle.load(f)
You shouldn’t have to decode it. pickle.load
will give you an exact copy of whatever it is you saved. At this point you, should be able to work with cl
as if you just created it.
maybe the file was encoded using latin1:
f = open('sample_classifier.pickle', encoding="latin1")
since none of the suggested answers helped me with the error – i’ve switched to joblib instead:
import joblib
clf_loaded = joblib.load('classifier_file_name.joblib')
worked great !
try this code its working :
with open('your picle file name', 'rb') as f:
classifier = pickle.load(f, encoding="latin1")
- Note : if not fixed you can try change (encoding) type to ("utf-8") if you use python2, but if you use python3.x encoding will be default ("utf-8") ….
I’m trying to do some text classification using Textblob. I’m first training the model and serializing it using pickle as shown below.
import pickle
from textblob.classifiers import NaiveBayesClassifier
with open('sample.csv', 'r') as fp:
cl = NaiveBayesClassifier(fp, format="csv")
f = open('sample_classifier.pickle', 'wb')
pickle.dump(cl, f)
f.close()
And when I try to run this file:
import pickle
f = open('sample_classifier.pickle', encoding="utf8")
cl = pickle.load(f)
f.close()
I get this error:
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x80 in position
0: invalid start byte
Following are the content of my sample.csv:
My SQL is not working correctly at all. This was a wrong choice, SQL
I’ve issues. Please respond immediately, Support
Where am I going wrong here? Please help.
By choosing to open
the file in mode wb
, you are choosing to write in raw binary. There is no character encoding being applied.
Thus to read this file, you should simply open
in mode rb
.
I think you should open the file as
f = open('sample_classifier.pickle', 'rb')
cl = pickle.load(f)
You shouldn’t have to decode it. pickle.load
will give you an exact copy of whatever it is you saved. At this point you, should be able to work with cl
as if you just created it.
maybe the file was encoded using latin1:
f = open('sample_classifier.pickle', encoding="latin1")
since none of the suggested answers helped me with the error – i’ve switched to joblib instead:
import joblib
clf_loaded = joblib.load('classifier_file_name.joblib')
worked great !
try this code its working :
with open('your picle file name', 'rb') as f:
classifier = pickle.load(f, encoding="latin1")
- Note : if not fixed you can try change (encoding) type to ("utf-8") if you use python2, but if you use python3.x encoding will be default ("utf-8") ….