How to unload a data table from AWS Redshift and save into s3 bucket using Python (example attached )?
Question:
I am trying to extract data from AWS redshift tables and save into s3 bucket using Python . I have done the same in R, but i want to replicate the same in Python . Here is the code I am using
R
drv <- dbDriver("PostgreSQL")
connection <- dbConnect(drv,
host = "xyz.amazonaws.com",
port = "abcd",
user = "a",
password = "b",
dbname = "DB")
dbGetQuery(connection, "UNLOAD ('select COL1,COL2,COL3
from xyz
where user_name in (''ythm'')
and customer=''RANDOM''
and utc_date between ''2021-10-01'' and ''2022-01-21''
')
TO 's3://MYBUCKET/Industry_Raw_Data_'
CREDENTIALS
'aws_access_key_id=ABC;aws_secret_access_key=HYU'
DELIMITER '|'
ALLOWOVERWRITE
PARALLEL OFF;")
dbDisconnect(connection)
I have been able to connect to aws reshift Db with the below script
Python
import psycopg2
import pandas as pd
connection=psycopg2.connect(
host="xyz.amazonaws.com",
port = "abcd",
database="DB",
user="a",
password="b")
I am trying create a table and save into s3 bucket , any suggestions as to how to achieve that on Python ?
Answers:
After creating the connection, you can just run the same UNLOAD query.
The way you execute SQL statements is by creating a cursor and running the ‘execute’ method (https://www.psycopg.org/docs/cursor.html?highlight=execute#cursor.execute):
sql = """UNLOAD ('select COL1,COL2,COL3
from xyz
where user_name in (''ythm'')
and customer=''RANDOM''
and utc_date between ''2021-10-01'' and ''2022-01-21''
')
TO 's3://MYBUCKET/Industry_Raw_Data_'
CREDENTIALS
'aws_access_key_id=ABC;aws_secret_access_key=HYU'
DELIMITER '|'
ALLOWOVERWRITE
PARALLEL OFF"""
cur = con.cursor()
cur.execute(sql)
con.commit()
con.close()
I am trying to extract data from AWS redshift tables and save into s3 bucket using Python . I have done the same in R, but i want to replicate the same in Python . Here is the code I am using
R
drv <- dbDriver("PostgreSQL")
connection <- dbConnect(drv,
host = "xyz.amazonaws.com",
port = "abcd",
user = "a",
password = "b",
dbname = "DB")
dbGetQuery(connection, "UNLOAD ('select COL1,COL2,COL3
from xyz
where user_name in (''ythm'')
and customer=''RANDOM''
and utc_date between ''2021-10-01'' and ''2022-01-21''
')
TO 's3://MYBUCKET/Industry_Raw_Data_'
CREDENTIALS
'aws_access_key_id=ABC;aws_secret_access_key=HYU'
DELIMITER '|'
ALLOWOVERWRITE
PARALLEL OFF;")
dbDisconnect(connection)
I have been able to connect to aws reshift Db with the below script
Python
import psycopg2
import pandas as pd
connection=psycopg2.connect(
host="xyz.amazonaws.com",
port = "abcd",
database="DB",
user="a",
password="b")
I am trying create a table and save into s3 bucket , any suggestions as to how to achieve that on Python ?
After creating the connection, you can just run the same UNLOAD query.
The way you execute SQL statements is by creating a cursor and running the ‘execute’ method (https://www.psycopg.org/docs/cursor.html?highlight=execute#cursor.execute):
sql = """UNLOAD ('select COL1,COL2,COL3
from xyz
where user_name in (''ythm'')
and customer=''RANDOM''
and utc_date between ''2021-10-01'' and ''2022-01-21''
')
TO 's3://MYBUCKET/Industry_Raw_Data_'
CREDENTIALS
'aws_access_key_id=ABC;aws_secret_access_key=HYU'
DELIMITER '|'
ALLOWOVERWRITE
PARALLEL OFF"""
cur = con.cursor()
cur.execute(sql)
con.commit()
con.close()