airflow PostgresOperator report number of inserts/updates/deletes
Question:
I’m exploring replacing our home-build SQL file orchestration framework with apache airflow.
We currently have extensive logging on execution time, history and number of records INSERTED
/UPDATED
/DELETED
. The first two are supported by Airflow standard logging, however, I could not find a way to log the resulting counts of the operations.
What would be the way to log these? Preferably by sql file? And how make them visible in a nice graph?
My simple exmaple DAG looks like this:
with DAG(
dag_id="postgres_operator_dag",
start_date=datetime.datetime(2023, 2, 2),
schedule_interval=None,
catchup=False,
) as dag:
proc_r= PostgresOperator(task_id='proc_r',
postgres_conn_id='postgres_dbad2a',
sql=['001-test.sql','002-test.sql'])
proc_r
Answers:
First, PostgresOperator
is deprecated. You should use SQLExecuteQueryOperator
(see source code).
I raised a PR to address this which is expected to be released in next version of apache-airflow-providers-common-sql
.
For apache-airflow-providers-common-sql>1.3.4
:
SQLExecuteQueryOperator(
...,
show_return_value_in_logs=True
)
For apache-airflow-providers-common-sql<=1.3.4
:
The operator does not support printing to log, it can only push to xcom the result value. You can handle it by writing a custom operator (Noting: It require to override a private function which is risky! so use this with judgment)
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
class MySQLExecuteQueryOperator(SQLExecuteQueryOperator):
def _process_output(self, results: list[Any], descriptions: list[Sequence[Sequence] | None]) -> list[Any]:
self.log.info("result is: %s", results)
return results
Running:
MySQLExecuteQueryOperator(task_id='some_sql',
conn_id='postgres_default',
sql="SELECT 4*5"
)
I’m exploring replacing our home-build SQL file orchestration framework with apache airflow.
We currently have extensive logging on execution time, history and number of records INSERTED
/UPDATED
/DELETED
. The first two are supported by Airflow standard logging, however, I could not find a way to log the resulting counts of the operations.
What would be the way to log these? Preferably by sql file? And how make them visible in a nice graph?
My simple exmaple DAG looks like this:
with DAG(
dag_id="postgres_operator_dag",
start_date=datetime.datetime(2023, 2, 2),
schedule_interval=None,
catchup=False,
) as dag:
proc_r= PostgresOperator(task_id='proc_r',
postgres_conn_id='postgres_dbad2a',
sql=['001-test.sql','002-test.sql'])
proc_r
First, PostgresOperator
is deprecated. You should use SQLExecuteQueryOperator
(see source code).
I raised a PR to address this which is expected to be released in next version of apache-airflow-providers-common-sql
.
For apache-airflow-providers-common-sql>1.3.4
:
SQLExecuteQueryOperator(
...,
show_return_value_in_logs=True
)
For apache-airflow-providers-common-sql<=1.3.4
:
The operator does not support printing to log, it can only push to xcom the result value. You can handle it by writing a custom operator (Noting: It require to override a private function which is risky! so use this with judgment)
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
class MySQLExecuteQueryOperator(SQLExecuteQueryOperator):
def _process_output(self, results: list[Any], descriptions: list[Sequence[Sequence] | None]) -> list[Any]:
self.log.info("result is: %s", results)
return results
Running:
MySQLExecuteQueryOperator(task_id='some_sql',
conn_id='postgres_default',
sql="SELECT 4*5"
)