How to use coda as device on a gpu instance when deploying an endpoint?

Question:

I have the following code to deploy my model:

model = PyTorchModel(
    entry_point='inference.py',
    source_dir='code',
    role=role,
    model_data=model_data,
    framework_version="1.12.0",
    py_version='py38',
    code_location='s3://staging',
    name='Staging-Model'
)

instance_type = 'ml.g4dn.xlarge'

predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),

)

In my inference code I have:

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
logger.info('Model will be loaded into:{}'.format(DEVICE))

And the logger says it is loading the model into cpu, and the instance have a GPU available. How can I load my model into cuda?

Asked By: Diego Rodea

||

Answers:

As ascertained in the comments, the instance on which the model runs is CPU-based.

This happens because when the model is deployed, it already assumes that the model has been created with the precise configuration.

We can try to make the container for the model explicit like this:

import sagemaker
from sagemaker.model import Model

# this retrieves 'pytorch-inference:1.12.0-gpu-py38'
inf_img_uri = sagemaker.image_uris.retrieve(
    framework='pytorch',
    region=region,
    image_scope='inference',
    version="1.12.0",
    instance_type='ml.g4dn.xlarge',
    py_version='py38'
)

pytorch_model = Model(
    image_uri=inf_img_uri,
    model_data=model_data,
    role=role,
    entry_point='inference.py',
    source_dir='code',
    code_location='s3://staging',
    name='Staging-Model'
)

If you are executing this within a pipeline, you may need a model creation step before deployment.

Answered By: Giuseppe La Gualano