How do I list my scheduled queries via the Python google client API?

Question:

I have set up my service account and I can run queries on bigQuery using client.query().

I could just write all my scheduled queries into this new client.query() format but I already have many scheduled queries so I was wondering if there is a way I can get/list the scheduled queries and then use that information to run those queries from a script.

I can query with big Query but I want to list queries

Asked By: Nicholas O'Connell

||

Answers:

Using cloud SDK there is a command that gets you what you want and more:
bq ls --transfer_config --transfer_location=US --format=prettyjson, more about that here – List Scheduled Queries in BigQuery

Executing this within your command prompt (given google cloud SKD is installed) gives you the following (with red part = scheduled query sql):

enter image description here


Following that you can run this as a shell sub process within Python and parse it out:

import pandas as pd
import json
from subprocess import PIPE, run, call

response = run('bq ls --transfer_config --transfer_location=US --format=prettyjson', 
               stdout=PIPE, 
               stderr=PIPE, 
               universal_newlines=True, 
               shell=True)

response

Here is first few lines resulting from above:

CompletedProcess(args='bq ls --transfer_config --transfer_location=US --format=prettyjson', returncode=0, stdout='[n  {n    "dataSourceId": "scheduled_query",...

Then to get to sql you could access output via response.stdout and parse as json and then dictionary your way in to desired results or get it into pandas dataframe format and go from there like below:

data = json.loads(response.stdout)
df = pd.json_normalize(data)

df.columns =

dataSourceId
datasetRegion
destinationDatasetId
disabled
displayName
name
schedule
state
updateTime
userId
emailPreferences.enableFailureEmail
params.destination_table_name_template

### sql located in this one
params.query

params.write_disposition
scheduleOptions.startTime
params.overwrite_destination_table
params.source_dataset_id
params.source_project_id
scheduleOptions.endTime
nextRunTime
Answered By: Yev Guyduy

Yes, you can use the APIs. When you don’t know which one to use, I have a tip. Use the command proposed by @Yev

bq ls --transfer_config --transfer_location=US --format=prettyjson

But log the API calls. for that use the --apilog <logfile name> parameter like that

bq --apilog ./log ls --transfer_config --transfer_location=US --format=prettyjson

And, magically, you can find the API called by the command:
https://bigquerydatatransfer.googleapis.com/v1/projects/<PROJECT-ID>/locations/US/transferConfigs?alt=json

Then, a simple google search leads you to the correct documentation


In python, add that dependencies in your requirements.txt: google-cloud-bigquery-datatransfer and use that code

from google.cloud import bigquery_datatransfer

client = bigquery_datatransfer.DataTransferServiceClient()
parent = client.common_project_path("<PROJECT-ID>")
resp = client.list_transfer_configs(parent=parent)
print(resp)
Answered By: guillaume blaquiere

It might be me, but I had a hard time finding python documentation on using the bigquery datatransfer API (= scheduled queries).

In python you can do:

from google.cloud import bigquery_datatransfer

bq_datatransfer_client = bigquery_datatransfer.DataTransferServiceClient()
request_datatransfers = bigquery_datatransfer.ListTransferConfigsRequest(
    # if US, you can just do parent='projects/YOUR_PROJECT_ID'
    parent='projects/YOUR_PROJECT_ID/locations/EU',  
)

# this method will also deal with pagination
response_datatransfers = bq_datatransfer_client.list_transfer_configs(
    request=request_datatransfers)

# to convert the response to a list of scheduled queries
datatransfers = list(response_datatransfers) 

Below are some useful resources on the API:

Specifically on the .list_transfer_configs() method:

On the ListTransferConfigsRequest class:

Code snippets and examples on how to use the python API:

Some official documentation on the use of the API:

Answered By: Sander van den Oord

With python you can do:

from google.cloud import bigquery_datatransfer

bq_datatransfer_client = bigquery_datatransfer.DataTransferServiceClient()

parent = 'projects/YOUR_PROJECT_ID/locations/EU'
resp_datatransfers = bq_datatransfer_client.list_transfer_configs(parent=parent)

datatransfers = list(resp_datatransfers)

Link to documentation: .list_transfer_configs()

Answered By: Sander van den Oord

to get any specific column:

bq ls --transfer_config --transfer_location=US --format=prettyjson|jq -r ".[]|[.name,.displayName,.dataSourceId,.state,.userId]|@csv"|tr -d """
Answered By: Vadiraj k.s