Airflow Jinja Templating in params

Question:

I have an Airflow operator which allows me to query Athena which accepts a Jinja templated file as the query input. Usually, I pass variables such as table/database names, etc to the template for create table and add partition statements. This works fine for defined strings.

My task definition looks like this:

        db = 'sample_db'
        table = 'sample_table'
        out = 's3://sample'
        p1='2020'
        p2='1'

        add_partition_task= AWSAthenaOperator(
            task_id='add_partition',
            query='add_partition.sql',
            params={'database': db,
                    'table_name': table,
                    'p1': p1
                    'p2': p2},
            database=db,
            output_location=out
        )

The SQL file being templated looks like:

ALTER TABLE {{ params.database }}.{{ params.table_name }} ADD IF NOT EXISTS
PARTITION (partition1= '{{ params.p1 }}', partition2 = '{{ params.p2 }}')

This templating works fine.

The extension to this is to allow ‘partition1’ and ‘partition2’ to be defined by a jinja templated variable containing an XCOM pull from an earlier task which converts a date into Financial Year and Period. Using date as the partition is a possibility but I am interested in whether params can be forced to accept Jinja templates.

The code I would like to use looks like the following:

        db = 'sample_db'
        table = 'sample_table'
        out = 's3://sample'
        p1 = '{{ task_instance.xcom_pull(task_ids="convert_to_partition", key="p1") }}'
        p2 = '{{ task_instance.xcom_pull(task_ids="convert_to_partition", key="p2") }}'

        add_partition_task= AWSAthenaOperator(
            task_id='add_partition',
            query='add_partition.sql',
            params={'database': db,
                    'table_name': table,
                    'p1': p1
                    'p2': p2},
            database=db,
            output_location=out
        )

So now params.p1 and params.p2 contain a Jinja template. Obviously, params does not support jinja templating as the SQL rendered contains the string literal ‘{{ task_instance….’ rather than the rendered XCOM value.

Adding params to the template_fields in the operator implementation is not enough to force it to render the template. My operator only extends BaseOperator and uses an AthenaHook which extends AwsHook.
Does anyone have some experience with passing templated variables in a params like structure or an alternative approach?

Asked By: cmclel

||

Answers:

Since AWSAthenaOperator has both query as a templated field and accepts file extension .sql, you can include the jinja template in the files themselves.

I modified your AWSAthenaOperator a bit to fit the example.

add_partition_task= AWSAthenaOperator(
    task_id='add_partition',
    query='add_partition.sql',
    params={
        'database': db,
        'table_name': table,
    }
)

Here is what the add_partition.sql could look like.

INSERT OVERWRITE TABLE {{ params.database }}.{{ params.table_name }} (day, month, year) 
SELECT * FROM db.table
WHERE p1 = "{{ task_instance.xcom_pull(task_ids='convert_to_partition', key='p1') }}" 
  AND p2 = "{{ task_instance.xcom_pull(task_ids='convert_to_partition', key='p2') }}"
;
Answered By: Alan Ma