How do I list my scheduled queries via the Python google client API?
Question:
I have set up my service account and I can run queries on bigQuery using client.query()
.
I could just write all my scheduled queries into this new client.query()
format but I already have many scheduled queries so I was wondering if there is a way I can get/list the scheduled queries and then use that information to run those queries from a script.
Answers:
Using cloud SDK there is a command that gets you what you want and more:
bq ls --transfer_config --transfer_location=US --format=prettyjson
, more about that here – List Scheduled Queries in BigQuery
Executing this within your command prompt (given google cloud SKD is installed) gives you the following (with red part = scheduled query sql):
Following that you can run this as a shell sub process within Python and parse it out:
import pandas as pd
import json
from subprocess import PIPE, run, call
response = run('bq ls --transfer_config --transfer_location=US --format=prettyjson',
stdout=PIPE,
stderr=PIPE,
universal_newlines=True,
shell=True)
response
Here is first few lines resulting from above:
CompletedProcess(args='bq ls --transfer_config --transfer_location=US --format=prettyjson', returncode=0, stdout='[n {n "dataSourceId": "scheduled_query",...
Then to get to sql you could access output via response.stdout
and parse as json and then dictionary your way in to desired results or get it into pandas dataframe format and go from there like below:
data = json.loads(response.stdout)
df = pd.json_normalize(data)
df.columns =
dataSourceId
datasetRegion
destinationDatasetId
disabled
displayName
name
schedule
state
updateTime
userId
emailPreferences.enableFailureEmail
params.destination_table_name_template
### sql located in this one
params.query
params.write_disposition
scheduleOptions.startTime
params.overwrite_destination_table
params.source_dataset_id
params.source_project_id
scheduleOptions.endTime
nextRunTime
Yes, you can use the APIs. When you don’t know which one to use, I have a tip. Use the command proposed by @Yev
bq ls --transfer_config --transfer_location=US --format=prettyjson
But log the API calls. for that use the --apilog <logfile name>
parameter like that
bq --apilog ./log ls --transfer_config --transfer_location=US --format=prettyjson
And, magically, you can find the API called by the command:
https://bigquerydatatransfer.googleapis.com/v1/projects/<PROJECT-ID>/locations/US/transferConfigs?alt=json
Then, a simple google search leads you to the correct documentation
In python, add that dependencies in your requirements.txt
: google-cloud-bigquery-datatransfer
and use that code
from google.cloud import bigquery_datatransfer
client = bigquery_datatransfer.DataTransferServiceClient()
parent = client.common_project_path("<PROJECT-ID>")
resp = client.list_transfer_configs(parent=parent)
print(resp)
It might be me, but I had a hard time finding python documentation on using the bigquery datatransfer API (= scheduled queries).
In python you can do:
from google.cloud import bigquery_datatransfer
bq_datatransfer_client = bigquery_datatransfer.DataTransferServiceClient()
request_datatransfers = bigquery_datatransfer.ListTransferConfigsRequest(
# if US, you can just do parent='projects/YOUR_PROJECT_ID'
parent='projects/YOUR_PROJECT_ID/locations/EU',
)
# this method will also deal with pagination
response_datatransfers = bq_datatransfer_client.list_transfer_configs(
request=request_datatransfers)
# to convert the response to a list of scheduled queries
datatransfers = list(response_datatransfers)
Below are some useful resources on the API:
Specifically on the .list_transfer_configs()
method:
On the ListTransferConfigsRequest
class:
Code snippets and examples on how to use the python API:
Some official documentation on the use of the API:
With python you can do:
from google.cloud import bigquery_datatransfer
bq_datatransfer_client = bigquery_datatransfer.DataTransferServiceClient()
parent = 'projects/YOUR_PROJECT_ID/locations/EU'
resp_datatransfers = bq_datatransfer_client.list_transfer_configs(parent=parent)
datatransfers = list(resp_datatransfers)
Link to documentation: .list_transfer_configs()
to get any specific column:
bq ls --transfer_config --transfer_location=US --format=prettyjson|jq -r ".[]|[.name,.displayName,.dataSourceId,.state,.userId]|@csv"|tr -d """
I have set up my service account and I can run queries on bigQuery using client.query()
.
I could just write all my scheduled queries into this new client.query()
format but I already have many scheduled queries so I was wondering if there is a way I can get/list the scheduled queries and then use that information to run those queries from a script.
Using cloud SDK there is a command that gets you what you want and more:
bq ls --transfer_config --transfer_location=US --format=prettyjson
, more about that here – List Scheduled Queries in BigQuery
Executing this within your command prompt (given google cloud SKD is installed) gives you the following (with red part = scheduled query sql):
Following that you can run this as a shell sub process within Python and parse it out:
import pandas as pd
import json
from subprocess import PIPE, run, call
response = run('bq ls --transfer_config --transfer_location=US --format=prettyjson',
stdout=PIPE,
stderr=PIPE,
universal_newlines=True,
shell=True)
response
Here is first few lines resulting from above:
CompletedProcess(args='bq ls --transfer_config --transfer_location=US --format=prettyjson', returncode=0, stdout='[n {n "dataSourceId": "scheduled_query",...
Then to get to sql you could access output via response.stdout
and parse as json and then dictionary your way in to desired results or get it into pandas dataframe format and go from there like below:
data = json.loads(response.stdout)
df = pd.json_normalize(data)
df.columns =
dataSourceId
datasetRegion
destinationDatasetId
disabled
displayName
name
schedule
state
updateTime
userId
emailPreferences.enableFailureEmail
params.destination_table_name_template
### sql located in this one
params.query
params.write_disposition
scheduleOptions.startTime
params.overwrite_destination_table
params.source_dataset_id
params.source_project_id
scheduleOptions.endTime
nextRunTime
Yes, you can use the APIs. When you don’t know which one to use, I have a tip. Use the command proposed by @Yev
bq ls --transfer_config --transfer_location=US --format=prettyjson
But log the API calls. for that use the --apilog <logfile name>
parameter like that
bq --apilog ./log ls --transfer_config --transfer_location=US --format=prettyjson
And, magically, you can find the API called by the command:
https://bigquerydatatransfer.googleapis.com/v1/projects/<PROJECT-ID>/locations/US/transferConfigs?alt=json
Then, a simple google search leads you to the correct documentation
In python, add that dependencies in your requirements.txt
: google-cloud-bigquery-datatransfer
and use that code
from google.cloud import bigquery_datatransfer
client = bigquery_datatransfer.DataTransferServiceClient()
parent = client.common_project_path("<PROJECT-ID>")
resp = client.list_transfer_configs(parent=parent)
print(resp)
It might be me, but I had a hard time finding python documentation on using the bigquery datatransfer API (= scheduled queries).
In python you can do:
from google.cloud import bigquery_datatransfer
bq_datatransfer_client = bigquery_datatransfer.DataTransferServiceClient()
request_datatransfers = bigquery_datatransfer.ListTransferConfigsRequest(
# if US, you can just do parent='projects/YOUR_PROJECT_ID'
parent='projects/YOUR_PROJECT_ID/locations/EU',
)
# this method will also deal with pagination
response_datatransfers = bq_datatransfer_client.list_transfer_configs(
request=request_datatransfers)
# to convert the response to a list of scheduled queries
datatransfers = list(response_datatransfers)
Below are some useful resources on the API:
Specifically on the .list_transfer_configs()
method:
On the ListTransferConfigsRequest
class:
Code snippets and examples on how to use the python API:
Some official documentation on the use of the API:
With python you can do:
from google.cloud import bigquery_datatransfer
bq_datatransfer_client = bigquery_datatransfer.DataTransferServiceClient()
parent = 'projects/YOUR_PROJECT_ID/locations/EU'
resp_datatransfers = bq_datatransfer_client.list_transfer_configs(parent=parent)
datatransfers = list(resp_datatransfers)
Link to documentation: .list_transfer_configs()
to get any specific column:
bq ls --transfer_config --transfer_location=US --format=prettyjson|jq -r ".[]|[.name,.displayName,.dataSourceId,.state,.userId]|@csv"|tr -d """