How to set dependencies between DAGs in Airflow?
Question:
I am using Airflow to schedule batch jobs. I have one DAG (A) that runs every night and another DAG (B) that runs once per month. B depends on A having completed successfully. However B takes a long time to run and so I would like to keep it in a separate DAG to allow better SLA reporting.
How can I make running DAG B dependent on a successful run of DAG A on the same day?
Answers:
You can achieve this behavior using an operator called ExternalTaskSensor.
Your task (B1) in DAG(B) will be scheduled and wait for a success on task (A2) in DAG(A)
It looks like a TriggerDagRunOperator can be used as well, and you can use a python callable to add some logic. As explained here : https://www.linkedin.com/pulse/airflow-lesson-1-triggerdagrunoperator-siddharth-anand
When cross-DAG dependency is needed, there are often two requirements:
-
Task B1
on DAG B
needs to run after task A1
on DAG A
is done. This can be achieved using ExternalTaskSensor
as others have mentioned:
B1 = ExternalTaskSensor(task_id="B1",
external_dag_id='A',
external_task_id='A1',
mode="reschedule")
-
When user clears task A1
on DAG A
, we want Airflow to clear task B1
on DAG B
to let it re-run. This can be achieved using ExternalTaskMarker
(since Airflow v1.10.8).
A1 = ExternalTaskMarker(task_id="A1",
external_dag_id="B",
external_task_id="B1")
Please see the doc about cross-DAG dependencies for more details: https://airflow.apache.org/docs/stable/howto/operator/external.html
I am using Airflow to schedule batch jobs. I have one DAG (A) that runs every night and another DAG (B) that runs once per month. B depends on A having completed successfully. However B takes a long time to run and so I would like to keep it in a separate DAG to allow better SLA reporting.
How can I make running DAG B dependent on a successful run of DAG A on the same day?
You can achieve this behavior using an operator called ExternalTaskSensor.
Your task (B1) in DAG(B) will be scheduled and wait for a success on task (A2) in DAG(A)
It looks like a TriggerDagRunOperator can be used as well, and you can use a python callable to add some logic. As explained here : https://www.linkedin.com/pulse/airflow-lesson-1-triggerdagrunoperator-siddharth-anand
When cross-DAG dependency is needed, there are often two requirements:
-
Task
B1
on DAGB
needs to run after taskA1
on DAGA
is done. This can be achieved usingExternalTaskSensor
as others have mentioned:B1 = ExternalTaskSensor(task_id="B1", external_dag_id='A', external_task_id='A1', mode="reschedule")
-
When user clears task
A1
on DAGA
, we want Airflow to clear taskB1
on DAGB
to let it re-run. This can be achieved usingExternalTaskMarker
(since Airflow v1.10.8).A1 = ExternalTaskMarker(task_id="A1", external_dag_id="B", external_task_id="B1")
Please see the doc about cross-DAG dependencies for more details: https://airflow.apache.org/docs/stable/howto/operator/external.html