how to Read .Sql file stored in S3 containing multiple SQL statements

Question:

I have a .sql file stored in S3 location in AWS which contains multiple SQL statements separated by semi colon as below:

Query1;
_______________
Query2;
_______________
Query3;

tried using 2 methods in AWS Glue job to read this S3 .sql file but no success:

  1. Method -1:
sql= open('s3://bucket1/a.sql','r').read().format(schema).split(';')

Error: There is no such file exists.

Even though file is present in S3 path. It seems Open() function doesnt work with S3 path.

  1. Method -2:
obj=boto3.client('s3')

query=obj.get_object(Bucket='bucket1',Key='a.sql')

Issue: this method is able to read full file and not individual sql statements.

***********************************************************************************i

The main objective is to read multiple sql statements from this s3 .sql file and execute them into redshift.

Asked By: Beginner

||

Answers:

Even though file is present in S3 path. It seems Open() function doesnt work with S3 path.

Of course it doesn’t. That is a function for opening files on the local file system. It doesn’t have any idea how to parse an S3 path and connect over the network to S3 and perform S3 object storage requests.


obj=boto3.client('s3')

query=obj.get_object(Bucket='bucket1',Key='a.sql')

Issue: this method is able to read full file and not individual sql statements.

You had the split() function in your first attempt. You just need to use it again in your second attempt:

obj=boto3.client('s3')
queries=obj.get_object(Bucket='bucket1',Key='a.sql')['Body'].read().split(';')

Then queries will be a list of your individual queries. So you can refer to them as queries[0], queries[1], etc.

Answered By: Mark B