How to save a Pytorch Model directly in s3 Bucket?

Question:

The title says it all – I want to save a pytorch model in an s3 bucket. What I tried was the following:

import boto3

s3 = boto3.client('s3')
saved_model = model.to_json()
output_model_file = output_folder + "pytorch_model.json"
s3.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=saved_model)

Unfortunately this doesn’t work, as .to_json() only works for tensorflow models. Does anyone know how to do it in pytorch?

Asked By: spadel

||

Answers:

With PyTorch we use a cloudpickle to serialize and save our model:

# Serialize the model
import cloudpickle
with open(path.join(path_to_generic_model_artifact, "model.pkl"), "wb") as outfile:
    # regressor is an object of a trained model
    cloudpickle.dump(model, outfile)

Deserialize the model:

import pickle
import os
model=pickle.load(open(os.path.join(model_dir, model_file_name), 'rb'))
Answered By: Hussain Bohra
  1. First step it’s to serialize your model to the file. There are many ways to do it, with basic PyTorch library you do it with out of box tools:
    #Serialize entire Model to the 
    torch.save(the_model, 'you/path/to/model')
  1. Once you have it on disk, you can then upload to S3.
    s3 = boto3.resource('s3')    
    s3.Bucket('bucketname').upload_file('you/path/to/model', 'folder/sub/path/to/s3key')
  1. Later you can simple download and serialize back to the Model.
    s3 = boto3.resource('s3')   
 
    s3.Bucket('bucketname').download_file(
        'folder/sub/path/to/s3key', 
         'you/path/to/model'
    )

    the_model = torch.load(PATH)
Answered By: GensaGames

Try serializing model to a buffer and write it to S3:

buffer = io.BytesIO()
torch.save(model, buffer)
s3.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=buffer.getvalue())
Answered By: igrinis

To expand a bit on the previous answers: there are two different guidelines in the PyTorch documentation on how to save a model, based on what you want to do with it later when you load it again.

  1. If you want to load the model for inference (i.e., to run predictions), then the documentation recommends using torch.save(model.state_dict(), PATH).
  2. If you want to load the model to resume training then the documentation recommends doing a bit more, so that you can properly resume training:
torch.save({
   'epoch': epoch,
   'model_state_dict': model.state_dict(),
   'optimizer_state_dict': optimizer.state_dict(),
   'loss': loss,
   ...
}, PATH)

In terms of moving those saved models into s3, the modelstore open source library could help you with that. Under the hood, this library is calling those same save() functions, creating a zip archive of the resulting files, and then storing models into a structured prefix in an s3 bucket. In practice, using it would look like this:

from modelstore import ModelStore

modelstore = ModelStore.from_aws_s3(os.environ["AWS_BUCKET_NAME"])

model, optim = train() # Your training code

# The upload function takes a domain string to organise and version your models
model_store.pytorch.upload("my-model-domain", model=model, optimizer=optim)
Answered By: neal

Anyone tried to use s3fs to load a model directly into s3?

Answered By: vbfh
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.