How to save a Pytorch Model directly in s3 Bucket?
Question:
The title says it all – I want to save a pytorch model in an s3 bucket. What I tried was the following:
import boto3
s3 = boto3.client('s3')
saved_model = model.to_json()
output_model_file = output_folder + "pytorch_model.json"
s3.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=saved_model)
Unfortunately this doesn’t work, as .to_json()
only works for tensorflow models. Does anyone know how to do it in pytorch?
Answers:
With PyTorch we use a cloudpickle to serialize and save our model:
# Serialize the model
import cloudpickle
with open(path.join(path_to_generic_model_artifact, "model.pkl"), "wb") as outfile:
# regressor is an object of a trained model
cloudpickle.dump(model, outfile)
Deserialize the model:
import pickle
import os
model=pickle.load(open(os.path.join(model_dir, model_file_name), 'rb'))
- First step it’s to serialize your model to the file. There are many ways to do it, with basic PyTorch library you do it with out of box tools:
#Serialize entire Model to the
torch.save(the_model, 'you/path/to/model')
- Once you have it on disk, you can then upload to S3.
s3 = boto3.resource('s3')
s3.Bucket('bucketname').upload_file('you/path/to/model', 'folder/sub/path/to/s3key')
- Later you can simple download and serialize back to the Model.
s3 = boto3.resource('s3')
s3.Bucket('bucketname').download_file(
'folder/sub/path/to/s3key',
'you/path/to/model'
)
the_model = torch.load(PATH)
Try serializing model to a buffer and write it to S3:
buffer = io.BytesIO()
torch.save(model, buffer)
s3.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=buffer.getvalue())
To expand a bit on the previous answers: there are two different guidelines in the PyTorch documentation on how to save a model, based on what you want to do with it later when you load it again.
- If you want to load the model for inference (i.e., to run predictions), then the documentation recommends using
torch.save(model.state_dict(), PATH)
.
- If you want to load the model to resume training then the documentation recommends doing a bit more, so that you can properly resume training:
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
...
}, PATH)
In terms of moving those saved models into s3, the modelstore open source library could help you with that. Under the hood, this library is calling those same save()
functions, creating a zip archive of the resulting files, and then storing models into a structured prefix in an s3 bucket. In practice, using it would look like this:
from modelstore import ModelStore
modelstore = ModelStore.from_aws_s3(os.environ["AWS_BUCKET_NAME"])
model, optim = train() # Your training code
# The upload function takes a domain string to organise and version your models
model_store.pytorch.upload("my-model-domain", model=model, optimizer=optim)
Anyone tried to use s3fs
to load a model directly into s3?
The title says it all – I want to save a pytorch model in an s3 bucket. What I tried was the following:
import boto3
s3 = boto3.client('s3')
saved_model = model.to_json()
output_model_file = output_folder + "pytorch_model.json"
s3.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=saved_model)
Unfortunately this doesn’t work, as .to_json()
only works for tensorflow models. Does anyone know how to do it in pytorch?
With PyTorch we use a cloudpickle to serialize and save our model:
# Serialize the model
import cloudpickle
with open(path.join(path_to_generic_model_artifact, "model.pkl"), "wb") as outfile:
# regressor is an object of a trained model
cloudpickle.dump(model, outfile)
Deserialize the model:
import pickle
import os
model=pickle.load(open(os.path.join(model_dir, model_file_name), 'rb'))
- First step it’s to serialize your model to the file. There are many ways to do it, with basic PyTorch library you do it with out of box tools:
#Serialize entire Model to the
torch.save(the_model, 'you/path/to/model')
- Once you have it on disk, you can then upload to S3.
s3 = boto3.resource('s3')
s3.Bucket('bucketname').upload_file('you/path/to/model', 'folder/sub/path/to/s3key')
- Later you can simple download and serialize back to the Model.
s3 = boto3.resource('s3')
s3.Bucket('bucketname').download_file(
'folder/sub/path/to/s3key',
'you/path/to/model'
)
the_model = torch.load(PATH)
Try serializing model to a buffer and write it to S3:
buffer = io.BytesIO()
torch.save(model, buffer)
s3.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=buffer.getvalue())
To expand a bit on the previous answers: there are two different guidelines in the PyTorch documentation on how to save a model, based on what you want to do with it later when you load it again.
- If you want to load the model for inference (i.e., to run predictions), then the documentation recommends using
torch.save(model.state_dict(), PATH)
. - If you want to load the model to resume training then the documentation recommends doing a bit more, so that you can properly resume training:
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
...
}, PATH)
In terms of moving those saved models into s3, the modelstore open source library could help you with that. Under the hood, this library is calling those same save()
functions, creating a zip archive of the resulting files, and then storing models into a structured prefix in an s3 bucket. In practice, using it would look like this:
from modelstore import ModelStore
modelstore = ModelStore.from_aws_s3(os.environ["AWS_BUCKET_NAME"])
model, optim = train() # Your training code
# The upload function takes a domain string to organise and version your models
model_store.pytorch.upload("my-model-domain", model=model, optimizer=optim)
Anyone tried to use s3fs
to load a model directly into s3?