Dynamic bigquery query in dataflow template

Question:

I’ve written a Dataflow job that works great when I run it manually. Here is the relevant section (with some validation code removed for clarity):

parser.add_argument('--end_datetime',
                    dest='end_datetime')
known_args, pipeline_args = parser.parse_known_args(argv)

query = <redacted SQL String with a placeholder for a date>
query = query.replace('#ENDDATETIME#', known_args.end_datetime)

with beam.Pipeline(options=pipeline_options) as p:
    rows = p | 'read query' >> beam.io.Read(beam.io.BigQuerySource(query=query, use_standard_sql=True))

Now I want to create a template and schedule it to run on a regular basis with a dynamic ENDDATETIME. As I understand it, in order to do this I need to change add_argument to add_value_provider_argument per this documentation:

https://cloud.google.com/dataflow/docs/templates/creating-templates

Unfortunately, it appears that ValueProvider values are not available when I need them, they’re only available inside the pipeline itself. (please correct me if I’m wrong here…). So I’m kind of stuck.

Does anyone have any pointers on how I could get a dynamic date into my query in a Dataflow template?

Asked By: Mike Keyes

||

Answers:

Python currently only supports ValueProvider options for FileBasedSource IOs. You can see that by clicking on the Python tab at the link you used:
https://cloud.google.com/dataflow/docs/templates/creating-templates

under the “Pipeline I/O and runtime parameters” section.