Has anyone managed to access SQL Server with Docker apache/airflow:2.0.1
Question:
I’m trying out apache airflow with the docker-compose using base container container apache/airflow:2.0.1
.
I’m following this tutorial https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#
How do you run a simple query to get data from a SQL Server database?
At this stage, I’d just like to see if it’s possible.
I’ve tired to extend the image
FROM apache/airflow:2.0.1
RUN pip install apache-airflow-providers-microsoft-azure==1.2.0rc1
RUN pip install --no-cache-dir --user apache-airflow-providers-microsoft-mssql
# this fails
# RUN pip install --no-cache-dir --user apache-airflow-providers-odbc
using this to get data
def mssql_func(**kwargs):
conn = MsSqlHook.get_connection(conn_id="mssql_default")
hook = conn.get_hook()
df = hook.get_pandas_df(sql="SELECT top 5 * FROM sometable")
#do whatever you need on the df
print(df)
Any ideas?
Answers:
I’m answering my own question here, mainly because my initial question didn’t give people a lot to go on. Much appreciation to all that responded regardless.
I needed to add a few packages to get the mssql connection working so i extended the docker image. It as a bug, so i needed to add apache-airflow-providers-microsoft-azure
too.
Dockerfile
FROM apache/airflow:2.0.1
RUN pip install apache-airflow-providers-microsoft-azure==1.2.0rc1
RUN pip install apache-airflow-providers-microsoft-mssql
Build the new image and set up the env var to use it in the docker-compose as specified in the tutorial
docker build -t search/apache:2.0.1 -f ./Dockerfile .
echo -e "AIRFLOW_IMAGE_NAME=custom/apache:2.0.1" >> .env
Here’s the code to get some data, I added the connection via the UI on the connections screen in Airflow.
DAG
from airflow import DAG
from airflow.operators.python import PythonOperator, get_current_context, task
from airflow.providers.microsoft.mssql.hooks.mssql import MsSqlHook
from airflow.utils.dates import days_ago
default_args = {
'owner': 'airflow',
}
with DAG(
'test_etl_mssqlhook',
default_args=default_args,
description='ETL DAG Test 3',
schedule_interval=None,
start_date=days_ago(2),
tags=['test'],
) as dag:
dag.doc_md = __doc__
def start(**kwargs):
print("MEH!")
def extract(**kwargs):
conn = MsSqlHook.get_connection(conn_id="mssql_default")
hook = conn.get_hook()
df = hook.get_pandas_df(sql="SELECT top 5 * FROM dbo.atable")
#do whatever you need on the df
print(df)
start_task = PythonOperator(
task_id='start',
python_callable=start,
)
extract_task = PythonOperator(
task_id='extract',
python_callable=extract,
)
start_task >> extract_task
I’m trying out apache airflow with the docker-compose using base container container apache/airflow:2.0.1
.
I’m following this tutorial https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#
How do you run a simple query to get data from a SQL Server database?
At this stage, I’d just like to see if it’s possible.
I’ve tired to extend the image
FROM apache/airflow:2.0.1
RUN pip install apache-airflow-providers-microsoft-azure==1.2.0rc1
RUN pip install --no-cache-dir --user apache-airflow-providers-microsoft-mssql
# this fails
# RUN pip install --no-cache-dir --user apache-airflow-providers-odbc
using this to get data
def mssql_func(**kwargs):
conn = MsSqlHook.get_connection(conn_id="mssql_default")
hook = conn.get_hook()
df = hook.get_pandas_df(sql="SELECT top 5 * FROM sometable")
#do whatever you need on the df
print(df)
Any ideas?
I’m answering my own question here, mainly because my initial question didn’t give people a lot to go on. Much appreciation to all that responded regardless.
I needed to add a few packages to get the mssql connection working so i extended the docker image. It as a bug, so i needed to add apache-airflow-providers-microsoft-azure
too.
Dockerfile
FROM apache/airflow:2.0.1
RUN pip install apache-airflow-providers-microsoft-azure==1.2.0rc1
RUN pip install apache-airflow-providers-microsoft-mssql
Build the new image and set up the env var to use it in the docker-compose as specified in the tutorial
docker build -t search/apache:2.0.1 -f ./Dockerfile .
echo -e "AIRFLOW_IMAGE_NAME=custom/apache:2.0.1" >> .env
Here’s the code to get some data, I added the connection via the UI on the connections screen in Airflow.
DAG
from airflow import DAG
from airflow.operators.python import PythonOperator, get_current_context, task
from airflow.providers.microsoft.mssql.hooks.mssql import MsSqlHook
from airflow.utils.dates import days_ago
default_args = {
'owner': 'airflow',
}
with DAG(
'test_etl_mssqlhook',
default_args=default_args,
description='ETL DAG Test 3',
schedule_interval=None,
start_date=days_ago(2),
tags=['test'],
) as dag:
dag.doc_md = __doc__
def start(**kwargs):
print("MEH!")
def extract(**kwargs):
conn = MsSqlHook.get_connection(conn_id="mssql_default")
hook = conn.get_hook()
df = hook.get_pandas_df(sql="SELECT top 5 * FROM dbo.atable")
#do whatever you need on the df
print(df)
start_task = PythonOperator(
task_id='start',
python_callable=start,
)
extract_task = PythonOperator(
task_id='extract',
python_callable=extract,
)
start_task >> extract_task