airflow

How to parse user_defined_macro in regular function or PythonOperator in Airflow

How to parse user_defined_macro in regular function or PythonOperator in Airflow Question: We use managed Airflow inside a GCP project. When I used BigQueryInsertJobOperator to execute queries in a query file, it used to automatically replace user_defined_macros in those files with the set value. from airflow import DAG from datetime import datetime from airflow.providers.google.cloud.operators.bigquery import …

Total answers: 2

Airflow create new tasks based on task return value

Airflow create new tasks based on task return value Question: How can I get xcom from an airflow task and create other tasks using theses values. Per exemple: def func_test(): return [‘task_2’, ‘task_3’] with DAG( ‘dag_name’, schedule_interval="@once", start_date=datetime(2022, 4, 19), catchup=False, default_args= { ‘depends_on_past’: False, ‘retries’: 0 } ) as dag: task_1 = PythonOperator( task_id=’func_test’, …

Total answers: 2

DBT – How to insert data twice in the same table using DBT?

DBT – How to insert data twice in the same table using DBT? Question: I have the scenario, where I need to insert into the same table but in two steps First, I insert parent rows & hence get the Auto Incremented IDs from the database Second, I need to insert child data, which will …

Total answers: 1

How to load a BigQuery table from a file in GCS Bucket using Airflow?

How to load a BigQuery table from a file in GCS Bucket using Airflow? Question: I am new to Airflow, and I am wondering, how do I load a file from a GCS Bucket to BigQuery? So far, I have managed to do BigQuery to GCS Bucket: bq_recent_questions_query = bigquery_operator.BigQueryOperator( task_id=’bq_recent_questions_query’, sql=""" SELECT owner_display_name, title, …

Total answers: 2

apache airflow idempotent DAG implementation

apache airflow idempotent DAG implementation Question: I am generating a start and end time for an API query using the following: startTime = datetime.now(pytz.timezone(‘US/Eastern’)) – timedelta(hours = 1) endTime = datetime.now(pytz.timezone(‘US/Eastern’)) This works great and generates the correct parameters for the API query. But I noticed if the task fails and if I try to …

Total answers: 3

What should I do when Airflow task gets queued but never start running?

What should I do when Airflow task gets queued but never start running? Question: This is a simple question. My Airflow sometimes gets queued but never starts running. Does it mean I have an error in my code? What should I do? My environment: Python 3.7 apache-airflow==2.2.2 aws-mwaa-local-runner Asked By: masaaa015 || Source Answers: According …

Total answers: 1

Why os.getppid() and multiprocessing.parent_process().pid got different result using multiprocessing in airflow 2.x?

Why os.getppid() and multiprocessing.parent_process().pid got different result using multiprocessing in airflow 2.x? Question: I found that when using airflow, using multiprocessing causes an assert error. I solved my error ( this discussion and this discussion ). but I was curious about how process actually works in airflow job, so I ran the code. def process_function(i): …

Total answers: 1

How to get Airflow Docker ExternalPythonOperator working with in a python venv?

How to get Airflow Docker ExternalPythonOperator working with in a python venv? Question: Situation Since 2022 Sept 19 The release of Apache Airflow 2.4.0 Airflow supports ExternalPythonOperator I have asked the main contributors as well and I should be able to add 2 python virtual environments to the base image of Airflow Docker 2.4.1 and …

Total answers: 1

Raise Airflow Exception to Fail Task from CURL request

Raise Airflow Exception to Fail Task from CURL request Question: I am using airflow to schedule and automate Python scripts housed on a Ubuntu server. The DAG triggers a CURL request that hits a Flask API on the same machine which actually runs the script. Here is a high level overview of the flow: Airflow …

Total answers: 1