Apache Airflow – connecting to AWS S3 error

Question:

I’m trying to get S3 hook in Apache Airflow using the Connection object.

It looks like this:

class S3ConnectionHandler:
    def __init__():
        # values are read from configuration class, which loads from env. variables
        self._s3 = Connection(
            conn_type="s3",
            conn_id=config.AWS_CONN_ID,
            login=config.AWS_ACCESS_KEY_ID,
            password=config.AWS_SECRET_ACCESS_KEY,
            extra=json.dumps({"region_name": config.AWS_DEFAULT_REGION}),
        )

    @property
    def s3(self) -> Connection:
        return get_live_connection(self.logger, self._s3)

    @property
    def s3_hook(self) -> S3Hook:
        return self.s3.get_hook()

I get an error:

Broken DAG: [...] Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/connection.py", line 282, in get_hook
    return hook_class(**{conn_id_param: self.conn_id})
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/base_aws.py", line 354, in __init__
    raise AirflowException('Either client_type or resource_type must be provided.')
airflow.exceptions.AirflowException: Either client_type or resource_type must be provided.

Why does this happen? From what I understand the S3Hook calls the constructor from the parent class, AwsHook, and passes the client_type as "s3" string. How can I fix this?

I took this configuration for hook from here.

EDIT: I even get the same error when directly creating the S3 hook:

    @property
    def s3_hook(self) -> S3Hook:
        #return self.s3.get_hook()
        return S3Hook(
            aws_conn_id=config.AWS_CONN_ID,
            region_name=self.config.AWS_DEFAULT_REGION,
            client_type="s3",
            config={"aws_access_key_id": self.config.AWS_ACCESS_KEY_ID, "aws_secret_access_key": self.config.AWS_SECRET_ACCESS_KEY}
        )
``
Asked By: qalis

||

Answers:

First of all , I suggest that you create a S3 connection , for this you must go the path Admin >> Connections

S3 connection

After that and assuming that you want to load a file into S3 Bucket, you can code :

def load_csv_S3(): 
    
    # Send to S3
    
    hook = S3Hook(aws_conn_id="s3_conn")
    hook.load_file(
        filename='/write_your_path_file/filename.csv',
        key='filename.csv',
        bucket_name="BUCKET_NAME",
        replace=True,
    )

Finally, you can check all the functions of S3Hook HERE

Answered By: Alexbonella

No other answers worked, I couldn’t get around this. I ended up using boto3 library directly, which also gave me more low-level flexibility that Airflow hooks lacked.

Answered By: qalis

If youre using Airflow 2
please refer to the new documentation – it can be kind of tricky as most of the google searches redirect you to the old ones.

In my case I was using the AwsHook and had to switch to AwsBaseHook as it seems to be the only and correct one for version 2. I’ve had to switch the import path as well, now aws stuff isnt on contrib anymore its under providers

And as you can see on the new documentation you can pass either client_type ou resource_type as a AwsBaseHook parameter, depending on the one you want to use. Once you do that your problem should be solved

What has worked for me, in case it helps someone, in my answer to a similar post: https://stackoverflow.com/a/73652781/4187360

Answered By: babis21