Airflow DAG can't find local file to upload on s3

Question:

I have created a DAG to upload a local file into a personal S3 Bucket. However, when accessing http://localhost:9099/home I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: ‘C:UsersplataOneDriveΥπολογιστήςprojects backupsairflow-sqlserverdagspricedata.xlsx’
Ariflow error – broken dag

I have a Windows PC and I am running airflow on a docker container.

Here is the DAG’s code:

# airflow related
from airflow import DAG
from airflow.operators.python import PythonOperator
# other packages
from datetime import datetime
import boto3

with DAG(
    dag_id='file_to_s3',
    start_date=datetime(2022, 12, 5),
    catchup=False,
) as dag:
    pass


def file_to_s3():
    #Creating Session With Boto3.
    session = boto3.Session(
    aws_access_key_id='my_access_key_id',
    aws_secret_access_key='my_secret_access_key'
    )

    #Creating S3 Resource From the Session.
    s3 = session.resource('s3')

    result = s3.Bucket('flight-data-test-bucket').upload_file(r'C:UsersplataOneDriveΥπολογιστήςprojects backupsairflow-sqlserverdagspricedata.xlsx', 'pricedata.xlsx')

    return (result)


with DAG(
    dag_id='file_to_s3',
    start_date=datetime(2022, 12, 5),
    catchup=False
) as dag:
    # Upload the file
    task_file_to_s3 = PythonOperator(
        task_id='file_to_s3',
        python_callable=file_to_s3()
    )

I can’t understand why that happens since I have already stored my local file into my "dags" folder:
pricedata.xlsx location

And my "dags" folder is already mounted in the docker-compose.yml file which can be seen below:


  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    # For backward compatibility, with Airflow <2.3
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
    - ./data:/opt/airflow/data
  user: "${AIRFLOW_UID:-50000}:0"

Any ideas? Could this problem caused by the fact I am running Airflow on Windows through Docker?

Asked By: panos

||

Answers:

The file system of your docker containers are not shared with windows by default.

You can mount a drive so that you can persist files and share them between your windows and your docker:

https://www.docker.com/blog/file-sharing-with-docker-desktop/

note that in your docker, you will need the file path seen "in your docker container"

with your docker compose, it looks like your xslx files is mounted here:
./dags:/opt/airflow/dags

So I assume, that in your dag code, you could try:

result = s3.Bucket('flight-data-test-bucket').upload_file(r'opt/airflow/dags/pricedata.xlsx', 'pricedata.xlsx')

It might be a good idea to mount an additional drive with your project data outside of the DAG folder.

Answered By: PandaBlue

I was able to fix it after changing the path to: result = s3.Bucket('flight-data-test-bucket').upload_file('/opt/airflow/dags/pricedata.xlsx', 'pricedata.xlsx')

I had to also fix the python_callable=file_to_s3() to python_callable=file_to_s3

Answered By: panos