How to execute multiple sql files in airflow using PostgresOperator?
Question:
I have multiple sql files in my sql folder. I am not sure how to execute all the sql files within a DAG?
- dags
- sql
- dummy1.sql
- dummy2.sql
For a single file, below code works
sql_insert= PostgresOperator(task_id='sql_insert',
postgres_conn_id='postgres_conn',
sql='sql/dummy1.sql')
Answers:
With a list
sql_insert= PostgresOperator(task_id='sql_insert',
postgres_conn_id='postgres_conn',
sql=['sql/dummy1.sql', 'sql/dummy2.sql'])
Or you can make it dynamic
import glob
sql_insert= PostgresOperator(task_id='sql_insert',
postgres_conn_id='postgres_conn',
sql=glob.glob("sql/*.sql")]
Adding to @Javier Lopez Tomas’ answer, in order to do this dynamically, you must make the path of the files relative to the template_searchpath
you specify when initiating your DAG. glob
returns the absolute paths of the files matching the pattern. You could augment with the following:
import glob
sql_insert= PostgresOperator(task_id='sql_insert',
postgres_conn_id='postgres_conn',
sql=[x.split(template_searchpath)[1] for x in glob.glob("sql/*.sql")]
I have multiple sql files in my sql folder. I am not sure how to execute all the sql files within a DAG?
- dags
- sql
- dummy1.sql
- dummy2.sql
For a single file, below code works
sql_insert= PostgresOperator(task_id='sql_insert',
postgres_conn_id='postgres_conn',
sql='sql/dummy1.sql')
With a list
sql_insert= PostgresOperator(task_id='sql_insert',
postgres_conn_id='postgres_conn',
sql=['sql/dummy1.sql', 'sql/dummy2.sql'])
Or you can make it dynamic
import glob
sql_insert= PostgresOperator(task_id='sql_insert',
postgres_conn_id='postgres_conn',
sql=glob.glob("sql/*.sql")]
Adding to @Javier Lopez Tomas’ answer, in order to do this dynamically, you must make the path of the files relative to the template_searchpath
you specify when initiating your DAG. glob
returns the absolute paths of the files matching the pattern. You could augment with the following:
import glob
sql_insert= PostgresOperator(task_id='sql_insert',
postgres_conn_id='postgres_conn',
sql=[x.split(template_searchpath)[1] for x in glob.glob("sql/*.sql")]