Pass returned value from a previous Python operator task to another in airflow
Question:
I am a new user to Apache Airflow. I am building a DAG like the following to schedule tasks:
def add():
return 1 + 1
def multiply(a):
return a * 999
dag_args = {
'owner': 'me',
'depends_on_past': False,
'start_date': datetime(2023, 2, 27),
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': True,
'retries': 1,
'retry_delay': timedelta(minutes=3)}
with DAG(
dag_id='dag',
start_date=datetime(2023, 2, 27),
default_args=dag_args,
schedule_interval='@once',
end_date=None,) as dag:
t1 = PythonOperator(task_id="t1",
python_callable=add,
dag=dag
)
t2 = PythonOperator(task_id="t2",
python_callable=multiply,
dag=dag)
As you can see, t2
is dependent on the result of t1
.
I wonder that is there any way for me to pass the return result from t1
directly to t2
. I am using Apache Airflow 2.5.1 version and Python 3.9.
I did some research on xcom
, and found that all results of Airflow tasks are stored there, which can be accessed via code
task_instance = kwargs['t1'] task_instance.xcom_pull(task_ids='t1')
Answers:
Your DAG can be simplified using taskflow API. It will handle the Xcom and simplify the code.
import pendulum
from airflow.decorators import dag, task
@dag(
schedule_interval=None,
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
)
def taskflow_api_etl():
@task()
def add():
return 1+1
@task()
def multiply(a: int):
return a * 99
order_data = add()
multiply(order_data) # multiply uses the Xcom genreated by add()
etl_dag = taskflow_api_etl()
This code will generate the DAG:
When executing, add()
task will generate Xcom with value 2:
I am a new user to Apache Airflow. I am building a DAG like the following to schedule tasks:
def add():
return 1 + 1
def multiply(a):
return a * 999
dag_args = {
'owner': 'me',
'depends_on_past': False,
'start_date': datetime(2023, 2, 27),
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': True,
'retries': 1,
'retry_delay': timedelta(minutes=3)}
with DAG(
dag_id='dag',
start_date=datetime(2023, 2, 27),
default_args=dag_args,
schedule_interval='@once',
end_date=None,) as dag:
t1 = PythonOperator(task_id="t1",
python_callable=add,
dag=dag
)
t2 = PythonOperator(task_id="t2",
python_callable=multiply,
dag=dag)
As you can see, t2
is dependent on the result of t1
.
I wonder that is there any way for me to pass the return result from t1
directly to t2
. I am using Apache Airflow 2.5.1 version and Python 3.9.
I did some research on xcom
, and found that all results of Airflow tasks are stored there, which can be accessed via code
task_instance = kwargs['t1'] task_instance.xcom_pull(task_ids='t1')
Your DAG can be simplified using taskflow API. It will handle the Xcom and simplify the code.
import pendulum
from airflow.decorators import dag, task
@dag(
schedule_interval=None,
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
)
def taskflow_api_etl():
@task()
def add():
return 1+1
@task()
def multiply(a: int):
return a * 99
order_data = add()
multiply(order_data) # multiply uses the Xcom genreated by add()
etl_dag = taskflow_api_etl()
This code will generate the DAG:
When executing, add()
task will generate Xcom with value 2: