How to list Dataproc operations in `google-cloud-dataproc` client
Question:
I am looking for a way to do something similar to CLI’s gcloud dataproc operations list --filter "..."
.
The minimal code example:
from google.cloud import dataproc_v1
region = 'us-west1'
client_options = {"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
dataproc_cluster_client = dataproc_v1.ClusterControllerClient(client_options=client_options)
def list_operations(dataproc_cluster_client, region):
for op in dataproc_cluster_client.list_operations(
request={"filter": f"operationType = CREATE AND labels.goog-dataproc-location:{region}"}
):
print(op)
list_operations(dataproc_cluster_client, region)
The error:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Invalid resource field value in the request."
debug_error_string = "UNKNOWN:Error received from peer ipv4:xxx.xxx.xx.xxx:443 {created_time:"2023-03-28T11:23:00.466125+02:00", grpc_status:3, grpc_message:"Invalid resource field value in the request."}"
What’s wrong? I have failed to find any documentation around this resource field value
, its possible values, and how actually pass it in the request.
Answers:
The above error is due to the missing of the argument name
in the dataproc_cluster_client.transport.operations_client.list_operations()
You can try the changes mentioned in the code below:
region = "us-central1"
your_project_id = "your project id"
name = "projects/{}/regions/us-central1/operations".format(
your_project_id
)
for op in dataproc_cluster_client.list_operations(
request={"filter": f"operationType = CREATE AND labels.goog-dataproc-location:{region}","name":name}
)
You can consider below sample code as an example :
from google.cloud import dataproc_v1 as dataproc
region = "us-central1"
project = "your project"
dataproc_cluster_client = dataproc.ClusterControllerClient(
client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)
name = "projects/{}/regions/us-central1/operations".format(
project
)
operations = dataproc_cluster_client.transport.operations_client.list_operations(name , filter_="")
for i in operations:
print(i.name)
print("n")
Well, I was finally lucky to figure it out through trial and error. Anyway, I am still a bit confused why there is no (or hard to find) clear documentation on it.
The confusing resource field value
is a REST resource name in Google Cloud REST API, like so:
ops_resource_name = f"projects/{project}/regions/{region}/operations"
And this value must be passed as another entry in a request dictionary. The corrected code is below:
from google.cloud import dataproc_v1
project = 'xxxxxxx'
region = 'us-west1'
client_options = {"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
#dataproc_job_client = dataproc_v1.JobControllerClient(client_options=client_options)
dataproc_cluster_client = dataproc_v1.ClusterControllerClient(client_options=client_options)
def list_operations(dataproc_cluster_client, project, region):
ops_resource_name = f"projects/{project}/regions/{region}/operations"
ops_filter = f"operationType = CREATE"
for op in dataproc_cluster_client.list_operations(request={"name": ops_resource_name, "filter": ops_filter}
).operations:
print(op)
list_operations(dataproc_cluster_client, project, region)
I am looking for a way to do something similar to CLI’s gcloud dataproc operations list --filter "..."
.
The minimal code example:
from google.cloud import dataproc_v1
region = 'us-west1'
client_options = {"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
dataproc_cluster_client = dataproc_v1.ClusterControllerClient(client_options=client_options)
def list_operations(dataproc_cluster_client, region):
for op in dataproc_cluster_client.list_operations(
request={"filter": f"operationType = CREATE AND labels.goog-dataproc-location:{region}"}
):
print(op)
list_operations(dataproc_cluster_client, region)
The error:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Invalid resource field value in the request."
debug_error_string = "UNKNOWN:Error received from peer ipv4:xxx.xxx.xx.xxx:443 {created_time:"2023-03-28T11:23:00.466125+02:00", grpc_status:3, grpc_message:"Invalid resource field value in the request."}"
What’s wrong? I have failed to find any documentation around this resource field value
, its possible values, and how actually pass it in the request.
The above error is due to the missing of the argument name
in the dataproc_cluster_client.transport.operations_client.list_operations()
You can try the changes mentioned in the code below:
region = "us-central1"
your_project_id = "your project id"
name = "projects/{}/regions/us-central1/operations".format(
your_project_id
)
for op in dataproc_cluster_client.list_operations(
request={"filter": f"operationType = CREATE AND labels.goog-dataproc-location:{region}","name":name}
)
You can consider below sample code as an example :
from google.cloud import dataproc_v1 as dataproc
region = "us-central1"
project = "your project"
dataproc_cluster_client = dataproc.ClusterControllerClient(
client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)
name = "projects/{}/regions/us-central1/operations".format(
project
)
operations = dataproc_cluster_client.transport.operations_client.list_operations(name , filter_="")
for i in operations:
print(i.name)
print("n")
Well, I was finally lucky to figure it out through trial and error. Anyway, I am still a bit confused why there is no (or hard to find) clear documentation on it.
The confusing resource field value
is a REST resource name in Google Cloud REST API, like so:
ops_resource_name = f"projects/{project}/regions/{region}/operations"
And this value must be passed as another entry in a request dictionary. The corrected code is below:
from google.cloud import dataproc_v1
project = 'xxxxxxx'
region = 'us-west1'
client_options = {"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
#dataproc_job_client = dataproc_v1.JobControllerClient(client_options=client_options)
dataproc_cluster_client = dataproc_v1.ClusterControllerClient(client_options=client_options)
def list_operations(dataproc_cluster_client, project, region):
ops_resource_name = f"projects/{project}/regions/{region}/operations"
ops_filter = f"operationType = CREATE"
for op in dataproc_cluster_client.list_operations(request={"name": ops_resource_name, "filter": ops_filter}
).operations:
print(op)
list_operations(dataproc_cluster_client, project, region)