Airflow: BigQuery SQL Insert empty data to the table

Question:

Using Airflow, I am trying to get the data from one table to insert it into another in BigQuery. I have 5 origin tables and 5 destination tables. My SQL query and python logic work for the 4 tables where it successfully gets the data and inserts it into their respective destination tables, but it doesn’t work for 1 table.

query = '''SELECT * EXCEPT(eventdate) FROM `gcp_project.gcp_dataset.gcp_table_1`
    WHERE id = "1234"
    AND eventdate = "2023-01-18"
'''

# Delete the previous destination tables if existed
bigquery_client.delete_table("gcp_project.gcp_dataset.dest_gcp_table_1", not_found_ok=True)

job_config = bigquery.QueryJobConfig()

table_ref = bigquery_client.dataset(gcp_dataset).table(dest_gcp_table_1)
job_config.destination = table_ref
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TURNCATE

# Start the query, passing in the extra configuration.
query_job = bigquery_client.query(query=query,
      location='US',
      job_config=job_config
     )

#check if the table is successfully written
while not query_job.done():
     time.sleep(1)
logging.info("Data is written into a destination table with {} number of rows for id {}."
             .format(query_job.result().total_rows, id))

I have even tried using the SQL query with CREATE OR REPLACE but the result was still the same table_1 is coming as empty. I have also tried BigQueryInsertJobOperator, but table_1 still comes empty.

  • Note: Size of the Table_1 data is around 270 MB with 1463306 rows, it is also the biggest out of all the tables data when it comes to inserting it into another table

I tried to execute the above logic from my local machine and it works fine for table_1 as well, I see the data in GCP BigQuery.

I am not sure why and what’s happening behind this. Does anyone have any idea why this happening or what can it cause?

Asked By: terraCoder

||

Answers:

Found the root cause for this, the previous query which is responsible for populating the origin table was still running in the GCP BigQuery backend. Because of that the above query did get any data.

Solution: introduced query_job.result() This will wait for the job to be complete and then execute the next query.

Answered By: terraCoder