How to update rows in a BigQuery table using airflow

Question:

I am coding a DAG and want to execute an UPDATE statement to selectively set the values of certain fields in certain rows. The SQL statement is easy, but I am not sure how to execute it via Airflow.

The documentation on BigQueryUpdateTableOperator here says that the entire dataset will be replaced. I tried searching for a long time and could not find the right operator.

I tried putting an UPDATE statement in BigQueryInsertJobOperator and that threw an error.

How do I execute an UPDATE query on BigQuery via Airflow? My DAG is within a GCP Composer environment.

Asked By: Shahid Thaika

||

Answers:

I used BigQueryInsertJobOperator and was able to use UPDATE statement by storing it in an SQL file and then calling the sql file in the query parameter.

Please see below code I used in my testing:

from airflow import models
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
from airflow.utils.dates import days_ago

dag_id = "update-dag"
my_final_taskid = 'update-bq'
sql_file = 'my-query.sql'

with models.DAG(
    dag_id,
    schedule_interval=None,  # Override to match your needs
    start_date=days_ago(1),
    tags=["example"],
) as dag:
   update_bq_table = BigQueryInsertJobOperator(
        task_id=my_final_taskid,
      configuration={
            "query": {
                "query": sql_file,
                "useLegacySql": False,
    
                    }
              },
         )

Content of my my-query.sql:

update your-dataset.your-table set your_column = 'string' where another_column = 'string';
Answered By: Scott B

You have different options to do this. First of all:

The UpdateTable, CreateTable etc.. {}Table Operators are used to modify something related to the table itself as the schema, structure, metadata, or partition; that’s why they create again the table.

About the Update statement, there are a couple of ways to do it. Since the BQExecuteQuery is deprecated you should go using InsertJob Operator, as the comment above mentioned.

Info from this one if you want to check this more.: https://registry.astronomer.io/providers/google/modules/bigqueryinsertjoboperator

Also one important note: using UPDATE statements are costly in BQ, since it is not thought to do in big volumes of data. I suggest you to use it only for a few data, in another way, you can do something like outer joins and create temporal tables to do this, it would be cheaper.

Answered By: Chaotic Pechan