airflow

Exposing the right port to airflow services in docker

Exposing the right port to airflow services in docker Question: I’m trying to build a minimal datapipeline using docker, postgres, and airflow. My docker-compose.yaml file can be found here and is exteneded from airflow’s documentation here. I’ve extended it to include a seperate postgres database where I will write data, and a pgadmin instance (these …

Total answers: 2

Is It possible to trigger a Dag from an on_failure_callback?

Is It possible to trigger a Dag from an on_failure_callback? Question: I would like to trigger a Dag when a task is failed. I want to use the "on_failure_callback", however, I have not found information about it. Do you know if it is possible to trigger the dag from the "on_failure_callback"? In the past, I …

Total answers: 2

Error to load pickle file in Apache Airflow

Error to load pickle file in Apache Airflow Question: all! Could you please help me to load the serialization file in python to repoduce it in Airflow: My code: path = r’/Models/APP/model.pkl’ with open(path, ‘rb’) as f: g = pickle.load(f) def my_fucn(gg): return gg.predict([[30, 40, 50, 60]]) default_args = { ‘owner’: "timur", ‘retry_delay’: datetime.timedelta(minutes=5), } …

Total answers: 1

airflow webserver showing next run as start of data interval

airflow webserver showing next run as start of data interval Question: I have a dag like that: @dag( dag_id = "data-sync", schedule_interval = ‘*/30 * * * *’, start_date=pendulum.datetime(2023, 3, 9, tz="Asia/Hong_Kong"), catchup=False, dagrun_timeout=timedelta(minutes=20), ) So it runs every 30 minutes , starting today in my timezone. No catchup…. In the webserver UI I have …

Total answers: 1

How to debug blocking in a python async function

How to debug blocking in a python async function Question: I have zero knowledge of asynchronous python other than several hours of searching stackoverflow posts, and am struggling to figure out what is, occasionally, causing the below error: Triggerer’s async thread was blocked for 0.26 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get …

Total answers: 1

airflow.exceptions.AirflowException: 'branch_task_ids' must contain only valid task_ids

airflow.exceptions.AirflowException: 'branch_task_ids' must contain only valid task_ids Question: I have a dag which contains 1 custom task, 1 @task.branch task decorator and 1 taskgroup, inside the taskgroup I have multiple tasks that need to be triggered sequentially depending on the outcome of the @task.branch. PROCESS_BATCH_data_FILE = "batch_upload" SINGLE_data_FILE_FIRST_OPERATOR = "validate_data_schema_task" ENSURE_INTEGRITY_TASK = "provide_data_integrity_task" PROCESS_SINGLE_data_FILE = …

Total answers: 2

airflow PostgresOperator report number of inserts/updates/deletes

airflow PostgresOperator report number of inserts/updates/deletes Question: I’m exploring replacing our home-build SQL file orchestration framework with apache airflow. We currently have extensive logging on execution time, history and number of records INSERTED/UPDATED/DELETED. The first two are supported by Airflow standard logging, however, I could not find a way to log the resulting counts of …

Total answers: 1

Pass returned value from a previous Python operator task to another in airflow

Pass returned value from a previous Python operator task to another in airflow Question: I am a new user to Apache Airflow. I am building a DAG like the following to schedule tasks: def add(): return 1 + 1 def multiply(a): return a * 999 dag_args = { ‘owner’: ‘me’, ‘depends_on_past’: False, ‘start_date’: datetime(2023, 2, …

Total answers: 1

Passing param to Airflow DAG from another DAG with TriggerDagRunOperator

Passing param to Airflow DAG from another DAG with TriggerDagRunOperator Question: I’m trying to pass a param to Airflow DAG from another DAG with TriggerDagRunOperator, here is the code: @dag(default_args=default_args, catchup=False, #schedule_interval=DAG_SCHEDULE_INTERVAL, dagrun_timeout=timedelta(seconds=3600), tags=["tag1"], doc_md=DOC_MD, max_active_runs=1) def parent_dag(date_start="", date_end=""): triggered_dag = TriggerDagRunOperator( task_id=’triggered_dag’, trigger_dag_id=’triggered_dag’, conf={"date_start": "{{date_start}}", "date_end": "{{date_start}}"} ) triggered_dag dag = parent_dag() The params …

Total answers: 1

Passing results of BigQuery query task to the next task while using template macro

Passing results of BigQuery query task to the next task while using template macro Question: This seems a peculiar struggle, so I’m sure I’m missing something. Somehow I can’t seem to pass values using XCOM, unless I’m using functions to execute the tasks that provide and use the information and call them from PythonOperator. This …

Total answers: 2