How do I save a pickled file to a GCP bucket from a jupyter environment?

Question:

I’m trying to save an arviz inference object as a pickle file to a gcp storage bucket using the following function:

def upload_to_bucket(model, blob_name, bucket_name):
    """ 
    model: trace object
    Upload data to a bucket"""
     
    # Explicitly use service account credentials by specifying the private key
    # file.
    storage_client = storage.Client.from_service_account_json(
        'key1.json')

    #print(buckets = list(storage_client.list_buckets())

    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(blob_name)
    
    with open(f"gs://{config['GCS_bucket']}//{config['blob_name']}//", 'wb') as filehandler4:
        # Call load method to deserialze
        pickle.dump(model, filehandler4, protocol=4)
    
    #returns a public url
    return blob.public_url

upload_to_bucket(model=trace, blob_name='korea_hierarchy_seasonal_price.pkl', bucket_name='korea-forecasting-bucket')

However, I keep getting the following error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_45766/484011007.py in <module>
----> 1 upload_to_bucket(model=trace, blob_name='korea_hierarchy_seasonal_price.pkl', bucket_name='korea-forecasting-bucket')

~/google_bucket.py in upload_to_bucket(model, blob_name, bucket_name)
     46     #     pickle.dump(model, filehandler4, protocol=4)
     47 
---> 48     with open(f"gs://{'GCS_bucket'}//{'blob_name'}//", 'wb') as filehandler4:
     49         # Call load method to deserialze
     50         pickle.dump(model, filehandler4, protocol=4)

FileNotFoundError: [Errno 2] No such file or directory: 'gs://GCS_bucket//blob_name//'

The answers to similar questions uses the gs moniker when saving to a gcp cloud storage so I’m not sure what I’m doing wrong.

Asked By: Jordan

||

Answers:

You need to use the Cloud Storage (GCS) client library’s methods to e.g. read|write Objects (blobs).

You’re creating a regular (ext) file system object and giving that to GCS and this won’t work.

One option is to use GCS equivalent open.

(I’ve not tried this but) you should be able to:

with blob.open("wb") as f:
    pickle.dump(mode,f,protocol=4)
Answered By: DazWilkin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.