pickle data was truncated

Question:

i created a corpus file then stored in a pickle file.
my messages file is a collection of different news articles dataframe.

from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import re
ps = PorterStemmer()
corpus = []
for i in range(0, len(messages)):
    review = re.sub('[^a-zA-Z]', ' ', messages['text'][i])
    review = review.lower()
    review = review.split()

    review = [ps.stem(word) for word in review if not word in stopwords.words('english')]
    review = ' '.join(review)
    #print(i)
    corpus.append(review)

import pickle
with open('corpus.pkl', 'wb') as f:
   pickle.dump(corpus, f)

same code I ran on my laptop (jupyter notebook) and on google colab.

corpus.pkl => Google colab, downloaded with the following code:

from google.colab import files
files.download('corpus.pkl')

corpus1.pkl => saved from jupyter notebook code.

now When I run this code:

import pickle
with open('corpus.pkl', 'rb') as f:   # google colab
    corpus = pickle.load(f)

I get the following error:

UnpicklingError: pickle data was truncated

But this works fine:

import pickle
with open('corpus1.pkl', 'rb') as f:  # jupyter notebook saved
    corpus = pickle.load(f)

The only difference between both is that corpus1.pkl is run and saved through Jupyter notebook (on local) and corpus.pkl is saved on google collab and downloaded.

Could anybody tell me why is this happening?

for reference..

corpus.pkl  => 36 MB
corpus1.pkl => 50.5 MB
Asked By: omkar patil

||

Answers:

i would use pickle file created by my local machine only, that works properly

Answered By: omkar patil

Problem occurs due to partial download of glove vectors. I have uploaded the data through colab upload to session storage and after that simply write this command:

with open('/content/glove_vectors', 'rb') as f: 
    model = pickle.load(f)
    glove_words =  set(model.keys())
Answered By: himanshu mangoli