Has anyone managed to access SQL Server with Docker apache/airflow:2.0.1

Question:

I’m trying out apache airflow with the docker-compose using base container container apache/airflow:2.0.1.

I’m following this tutorial https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#

How do you run a simple query to get data from a SQL Server database?

At this stage, I’d just like to see if it’s possible.

I’ve tired to extend the image

FROM apache/airflow:2.0.1
RUN pip install apache-airflow-providers-microsoft-azure==1.2.0rc1
RUN pip install --no-cache-dir --user apache-airflow-providers-microsoft-mssql
# this fails
# RUN pip install --no-cache-dir --user apache-airflow-providers-odbc

using this to get data

def mssql_func(**kwargs):
    conn = MsSqlHook.get_connection(conn_id="mssql_default")
    hook = conn.get_hook()
    df = hook.get_pandas_df(sql="SELECT top 5 * FROM sometable")
    #do whatever you need on the df
    print(df)

Any ideas?

Asked By: Johnny

||

Answers:

I’m answering my own question here, mainly because my initial question didn’t give people a lot to go on. Much appreciation to all that responded regardless.

I needed to add a few packages to get the mssql connection working so i extended the docker image. It as a bug, so i needed to add apache-airflow-providers-microsoft-azure too.

Dockerfile

FROM apache/airflow:2.0.1
RUN pip install apache-airflow-providers-microsoft-azure==1.2.0rc1
RUN pip install apache-airflow-providers-microsoft-mssql

Build the new image and set up the env var to use it in the docker-compose as specified in the tutorial

docker build -t search/apache:2.0.1 -f ./Dockerfile .
echo -e "AIRFLOW_IMAGE_NAME=custom/apache:2.0.1" >> .env

Here’s the code to get some data, I added the connection via the UI on the connections screen in Airflow.

DAG

from airflow import DAG
from airflow.operators.python import PythonOperator, get_current_context, task
from airflow.providers.microsoft.mssql.hooks.mssql import MsSqlHook
from airflow.utils.dates import days_ago

default_args = {
    'owner': 'airflow',
}

with DAG(
    'test_etl_mssqlhook',
    default_args=default_args,
    description='ETL DAG Test 3',
    schedule_interval=None,
    start_date=days_ago(2),
    tags=['test'],
) as dag:
    dag.doc_md = __doc__

    def start(**kwargs):
        print("MEH!")

    def extract(**kwargs):
        conn = MsSqlHook.get_connection(conn_id="mssql_default")
        hook = conn.get_hook()
        df = hook.get_pandas_df(sql="SELECT top 5 * FROM dbo.atable")
        #do whatever you need on the df
        print(df)

    start_task = PythonOperator(
        task_id='start',
        python_callable=start,
    )

    extract_task = PythonOperator(
        task_id='extract',
        python_callable=extract,
    )

    start_task >> extract_task
Answered By: Johnny
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.