Apache Airflow – connecting to AWS S3 error
Question:
I’m trying to get S3 hook in Apache Airflow using the Connection object.
It looks like this:
class S3ConnectionHandler:
def __init__():
# values are read from configuration class, which loads from env. variables
self._s3 = Connection(
conn_type="s3",
conn_id=config.AWS_CONN_ID,
login=config.AWS_ACCESS_KEY_ID,
password=config.AWS_SECRET_ACCESS_KEY,
extra=json.dumps({"region_name": config.AWS_DEFAULT_REGION}),
)
@property
def s3(self) -> Connection:
return get_live_connection(self.logger, self._s3)
@property
def s3_hook(self) -> S3Hook:
return self.s3.get_hook()
I get an error:
Broken DAG: [...] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/connection.py", line 282, in get_hook
return hook_class(**{conn_id_param: self.conn_id})
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/base_aws.py", line 354, in __init__
raise AirflowException('Either client_type or resource_type must be provided.')
airflow.exceptions.AirflowException: Either client_type or resource_type must be provided.
Why does this happen? From what I understand the S3Hook calls the constructor from the parent class, AwsHook, and passes the client_type as "s3" string. How can I fix this?
I took this configuration for hook from here.
EDIT: I even get the same error when directly creating the S3 hook:
@property
def s3_hook(self) -> S3Hook:
#return self.s3.get_hook()
return S3Hook(
aws_conn_id=config.AWS_CONN_ID,
region_name=self.config.AWS_DEFAULT_REGION,
client_type="s3",
config={"aws_access_key_id": self.config.AWS_ACCESS_KEY_ID, "aws_secret_access_key": self.config.AWS_SECRET_ACCESS_KEY}
)
``
Answers:
First of all , I suggest that you create a S3 connection , for this you must go the path Admin >> Connections
After that and assuming that you want to load a file into S3 Bucket, you can code :
def load_csv_S3():
# Send to S3
hook = S3Hook(aws_conn_id="s3_conn")
hook.load_file(
filename='/write_your_path_file/filename.csv',
key='filename.csv',
bucket_name="BUCKET_NAME",
replace=True,
)
Finally, you can check all the functions of S3Hook HERE
No other answers worked, I couldn’t get around this. I ended up using boto3
library directly, which also gave me more low-level flexibility that Airflow hooks lacked.
If youre using Airflow 2
please refer to the new documentation – it can be kind of tricky as most of the google searches redirect you to the old ones.
In my case I was using the AwsHook
and had to switch to AwsBaseHook
as it seems to be the only and correct one for version 2. I’ve had to switch the import path as well, now aws stuff isnt on contrib
anymore its under providers
And as you can see on the new documentation you can pass either client_type ou resource_type as a AwsBaseHook parameter, depending on the one you want to use. Once you do that your problem should be solved
What has worked for me, in case it helps someone, in my answer to a similar post: https://stackoverflow.com/a/73652781/4187360
I’m trying to get S3 hook in Apache Airflow using the Connection object.
It looks like this:
class S3ConnectionHandler:
def __init__():
# values are read from configuration class, which loads from env. variables
self._s3 = Connection(
conn_type="s3",
conn_id=config.AWS_CONN_ID,
login=config.AWS_ACCESS_KEY_ID,
password=config.AWS_SECRET_ACCESS_KEY,
extra=json.dumps({"region_name": config.AWS_DEFAULT_REGION}),
)
@property
def s3(self) -> Connection:
return get_live_connection(self.logger, self._s3)
@property
def s3_hook(self) -> S3Hook:
return self.s3.get_hook()
I get an error:
Broken DAG: [...] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/connection.py", line 282, in get_hook
return hook_class(**{conn_id_param: self.conn_id})
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/base_aws.py", line 354, in __init__
raise AirflowException('Either client_type or resource_type must be provided.')
airflow.exceptions.AirflowException: Either client_type or resource_type must be provided.
Why does this happen? From what I understand the S3Hook calls the constructor from the parent class, AwsHook, and passes the client_type as "s3" string. How can I fix this?
I took this configuration for hook from here.
EDIT: I even get the same error when directly creating the S3 hook:
@property
def s3_hook(self) -> S3Hook:
#return self.s3.get_hook()
return S3Hook(
aws_conn_id=config.AWS_CONN_ID,
region_name=self.config.AWS_DEFAULT_REGION,
client_type="s3",
config={"aws_access_key_id": self.config.AWS_ACCESS_KEY_ID, "aws_secret_access_key": self.config.AWS_SECRET_ACCESS_KEY}
)
``
First of all , I suggest that you create a S3 connection , for this you must go the path Admin >> Connections
After that and assuming that you want to load a file into S3 Bucket, you can code :
def load_csv_S3():
# Send to S3
hook = S3Hook(aws_conn_id="s3_conn")
hook.load_file(
filename='/write_your_path_file/filename.csv',
key='filename.csv',
bucket_name="BUCKET_NAME",
replace=True,
)
Finally, you can check all the functions of S3Hook HERE
No other answers worked, I couldn’t get around this. I ended up using boto3
library directly, which also gave me more low-level flexibility that Airflow hooks lacked.
If youre using Airflow 2
please refer to the new documentation – it can be kind of tricky as most of the google searches redirect you to the old ones.
In my case I was using the AwsHook
and had to switch to AwsBaseHook
as it seems to be the only and correct one for version 2. I’ve had to switch the import path as well, now aws stuff isnt on contrib
anymore its under providers
And as you can see on the new documentation you can pass either client_type ou resource_type as a AwsBaseHook parameter, depending on the one you want to use. Once you do that your problem should be solved
What has worked for me, in case it helps someone, in my answer to a similar post: https://stackoverflow.com/a/73652781/4187360