Making asynchronous requests to a Vertex AI endpoint (Google cloud platform)

Question:

I deployed a model to the model registry on Vertex AI. I added an endpoint too, and I am able to make inferences. Below is the code that I wrote (using Python 3.9.12):

from google.cloud import aiplatform
from google.oauth2 import service_account

# settings is a Pydantic BaseSettings subclass object
credentials_json = json.loads(settings.GCP_VERTEX_SERVICE_ACC)
credentials = service_account.Credentials.from_service_account_info(
    info=credentials_json
)
aiplatform.init(project=settings.GCLOUD_PROJECT_NUMBER,
                location=settings.GCLOUD_LOCATION,
                credentials=credentials)
endpoint = aiplatform.Endpoint(settings.GCLOUD_SBERT_ENDPOINT_ID)

...

async def do_inference(list_strs: List[str]):

    result = endpoint.predict(instances=list_strs)
    return result.predictions

Right now I’m not able to make asynchronous requests. Is there a way around this? For instance, would using the aiplatform_v1beta1.PredictionServiceAsyncClient library be a solution? Thanks in advance!

—- EDIT —–

Below is the piece of code that did it for me in case someone else is struggling with the same thing.

import asyncio
from google.cloud import aiplatform_v1beta1
from google.oauth2 import service_account
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

# settings is a Pydantic BaseSettings subclass object
credentials_json = json.loads(settings.GCP_VERTEX_SERVICE_ACC)
credentials = service_account.Credentials.from_service_account_info(
    info=credentials_json
)

client_options = {"api_endpoint": f"{settings.GCLOUD_LOCATION}-aiplatform.googleapis.com"}
client = aiplatform_v1beta1.PredictionServiceAsyncClient(credentials=credentials, client_options=client_options)

...

async def do_inference(list_strs: List[str]):

    request = aiplatform_v1beta1.PredictRequest(endpoint=endpoint)
    request.instances.extend(list_strs)
    response = await client.predict(request)
    predictions = response.predictions
    return predictions

asyncio.get_event_loop().run_until_complete(do_inference())

This code owes a lot to @milad_raesi’s answer!

Asked By: mr_faulty

||

Answers:

Yes, using the aiplatform_v1beta1.PredictionServiceAsyncClient library would be a solution to make asynchronous requests. The PredictionServiceAsyncClient is a part of the Google Cloud AI Platform library for Python, which provides asynchronous APIs for making predictions on AI Platform.

Here is an example of how you could modify your code to use the PredictionServiceAsyncClient to make asynchronous requests:

from google.cloud import aiplatform_v1beta1
from google.oauth2 import service_account

# settings is a Pydantic BaseSettings subclass object
credentials_json = json.loads(settings.GCP_VERTEX_SERVICE_ACC)
credentials = service_account.Credentials.from_service_account_info(
    info=credentials_json
)

client_options = {"api_endpoint": f"{settings.GCLOUD_LOCATION}-aiplatform.googleapis.com"}
client = aiplatform_v1beta1.PredictionServiceAsyncClient(credentials=credentials, client_options=client_options)

# Endpoint name looks like: projects/{project_id}/locations/{location}/endpoints/{endpoint_id}
endpoint_name = f"projects/{settings.GCLOUD_PROJECT_NUMBER}/locations/{settings.GCLOUD_LOCATION}/endpoints/{settings.GCLOUD_SBERT_ENDPOINT_ID}"

...

async def do_inference(list_strs: List[str]):

    instances = [{"content": s} for s in list_strs]
    endpoint = client.endpoint_path(settings.GCLOUD_PROJECT_NUMBER, settings.GCLOUD_LOCATION, settings.GCLOUD_SBERT_ENDPOINT_ID)
    request = aiplatform_v1beta1.PredictRequest(endpoint=endpoint, instances=instances)

    response = await client.predict(request)
    predictions = response.payload

    return predictions

In this modified code, you are using the aiplatform_v1beta1.PredictionServiceAsyncClient library to create a client that can make asynchronous requests to the endpoint. You also need to set the api_endpoint in the client_options to point to the correct AI Platform API endpoint.

The endpoint_name variable is constructed using the project ID, location, and endpoint ID, and it is used to specify the endpoint when creating the aiplatform_v1beta1.PredictRequest object.

The do_inference function has also been modified to construct the instances that are passed to the endpoint, create the aiplatform_v1beta1.PredictRequest object, and make an asynchronous request using the client.predict method. Finally, the function returns the predictions.

Note that using the PredictionServiceAsyncClient library allows you to make asynchronous requests, which can improve the performance of your code by allowing it to continue executing while waiting for a response from the endpoint.

Answered By: Milad Raeisi