pickling python objects to google cloud storage
Question:
I’ve been pickling the objects to filesystem and reading them back when needed to work with those objects. Currently I’ve this code for that purpose.
def pickle(self, directory, filename):
if not os.path.exists(directory):
os.makedirs(directory)
with open(directory + '/' + filename, 'wb') as handle:
pickle.dump(self, handle)
@staticmethod
def load(filename):
with open(filename, 'rb') as handle:
element = pickle.load(handle)
return element
Now I’m moving my applictation(django) to Google app engine and figured that app engine does not allow me to write to file system. Google cloud storage seemed my only choice but I could not understand how could I pickle my objects as cloud storage objects and read them back to create the original python object.
Answers:
You can use the Cloud Storage client library.
Instead of open()
use cloudstorage.open()
(or gcs.open()
if importing cloudstorage
as gcs
, as in the above-mentioned doc) and note that the full filepath starts with the GCS bucket name (as a dir).
More details in the cloudstorage.open() documentation.
For Python 3 users, you can use gcsfs
library from Dask creator to solve your issue.
Example reading :
import gcsfs
fs = gcsfs.GCSFileSystem(project='my-google-project')
fs.ls('my-bucket')
>>> ['my-file.txt']
with fs.open('my-bucket/my-file.txt', 'rb') as f:
print(f.read())
It basically is identical with pickle tho :
with fs.open(directory + '/' + filename, 'wb') as handle:
pickle.dump(shandle)
To read, this is similar, but replace wb
by rb
and dump
with load
:
with fs.open(directory + '/' + filename, 'rb') as handle:
pickle.load(handle)
One other option (I tested it with Tensorflow
2.2.0
) which also works with Python 3:
from tensorflow.python.lib.io import file_io
with file_io.FileIO('gs://....', mode='rb') as f:
pickle.load(f)
This is very useful if you already use Tensorflow for example.
I’ve been pickling the objects to filesystem and reading them back when needed to work with those objects. Currently I’ve this code for that purpose.
def pickle(self, directory, filename):
if not os.path.exists(directory):
os.makedirs(directory)
with open(directory + '/' + filename, 'wb') as handle:
pickle.dump(self, handle)
@staticmethod
def load(filename):
with open(filename, 'rb') as handle:
element = pickle.load(handle)
return element
Now I’m moving my applictation(django) to Google app engine and figured that app engine does not allow me to write to file system. Google cloud storage seemed my only choice but I could not understand how could I pickle my objects as cloud storage objects and read them back to create the original python object.
You can use the Cloud Storage client library.
Instead of open()
use cloudstorage.open()
(or gcs.open()
if importing cloudstorage
as gcs
, as in the above-mentioned doc) and note that the full filepath starts with the GCS bucket name (as a dir).
More details in the cloudstorage.open() documentation.
For Python 3 users, you can use gcsfs
library from Dask creator to solve your issue.
Example reading :
import gcsfs
fs = gcsfs.GCSFileSystem(project='my-google-project')
fs.ls('my-bucket')
>>> ['my-file.txt']
with fs.open('my-bucket/my-file.txt', 'rb') as f:
print(f.read())
It basically is identical with pickle tho :
with fs.open(directory + '/' + filename, 'wb') as handle:
pickle.dump(shandle)
To read, this is similar, but replace wb
by rb
and dump
with load
:
with fs.open(directory + '/' + filename, 'rb') as handle:
pickle.load(handle)
One other option (I tested it with Tensorflow
2.2.0
) which also works with Python 3:
from tensorflow.python.lib.io import file_io
with file_io.FileIO('gs://....', mode='rb') as f:
pickle.load(f)
This is very useful if you already use Tensorflow for example.