How to list Dataproc operations in `google-cloud-dataproc` client

Question:

I am looking for a way to do something similar to CLI’s gcloud dataproc operations list --filter "...".

The minimal code example:

from google.cloud import dataproc_v1

region = 'us-west1'

client_options = {"api_endpoint": f"{region}-dataproc.googleapis.com:443"}

dataproc_cluster_client = dataproc_v1.ClusterControllerClient(client_options=client_options)

def list_operations(dataproc_cluster_client, region):
    for op in dataproc_cluster_client.list_operations(
        request={"filter": f"operationType = CREATE AND labels.goog-dataproc-location:{region}"}
    ):
        print(op)

list_operations(dataproc_cluster_client, region)

The error:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "Invalid resource field value in the request."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:xxx.xxx.xx.xxx:443 {created_time:"2023-03-28T11:23:00.466125+02:00", grpc_status:3, grpc_message:"Invalid resource field value in the request."}"

What’s wrong? I have failed to find any documentation around this resource field value, its possible values, and how actually pass it in the request.

Asked By: egordoe

||

Answers:

The above error is due to the missing of the argument name in the dataproc_cluster_client.transport.operations_client.list_operations()

You can try the changes mentioned in the code below:

region = "us-central1"
your_project_id = "your project id"

name = "projects/{}/regions/us-central1/operations".format(
    your_project_id
)

for op in dataproc_cluster_client.list_operations(
        request={"filter": f"operationType = CREATE AND labels.goog-dataproc-location:{region}","name":name}
    )

You can consider below sample code as an example :

from google.cloud import dataproc_v1 as dataproc

region = "us-central1"
project = "your project"

dataproc_cluster_client = dataproc.ClusterControllerClient(
   client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)


name = "projects/{}/regions/us-central1/operations".format(
   project
)
operations = dataproc_cluster_client.transport.operations_client.list_operations(name , filter_="")

for i in operations:
   print(i.name)
   print("n")
Answered By: kiran mathew

Well, I was finally lucky to figure it out through trial and error. Anyway, I am still a bit confused why there is no (or hard to find) clear documentation on it.

The confusing resource field value is a REST resource name in Google Cloud REST API, like so:

ops_resource_name = f"projects/{project}/regions/{region}/operations"

And this value must be passed as another entry in a request dictionary. The corrected code is below:

from google.cloud import dataproc_v1

project = 'xxxxxxx'
region = 'us-west1'

client_options = {"api_endpoint": f"{region}-dataproc.googleapis.com:443"}

#dataproc_job_client = dataproc_v1.JobControllerClient(client_options=client_options)
dataproc_cluster_client = dataproc_v1.ClusterControllerClient(client_options=client_options)

def list_operations(dataproc_cluster_client, project, region):
    ops_resource_name = f"projects/{project}/regions/{region}/operations"
    ops_filter = f"operationType = CREATE"
    for op in dataproc_cluster_client.list_operations(request={"name": ops_resource_name, "filter": ops_filter}
    ).operations:
        print(op)


list_operations(dataproc_cluster_client, project, region)

Answered By: egordoe