How to unload a data table from AWS Redshift and save into s3 bucket using Python (example attached )?

Question

I am trying to extract data from AWS redshift tables and save into s3 bucket using Python . I have done the same in R, but i want to replicate the same in Python . Here is the code I am using

R

drv <- dbDriver("PostgreSQL")
connection <- dbConnect(drv,
                    host = "xyz.amazonaws.com",
                    port = "abcd",
                    user = "a",
                    password = "b",
                    dbname = "DB")


 dbGetQuery(connection, "UNLOAD ('select COL1,COL2,COL3
                    from xyz 
                    where user_name in (''ythm'')
                     and customer=''RANDOM'' 
                     and utc_date between ''2021-10-01'' and ''2022-01-21'' 
                                                           
               ')
               
               TO 's3://MYBUCKET/Industry_Raw_Data_'
               CREDENTIALS
               'aws_access_key_id=ABC;aws_secret_access_key=HYU'
               DELIMITER '|'
               ALLOWOVERWRITE
               PARALLEL OFF;")




dbDisconnect(connection)

I have been able to connect to aws reshift Db with the below script

Python

import psycopg2
import pandas as pd

connection=psycopg2.connect(
host="xyz.amazonaws.com",
port = "abcd",
database="DB",
user="a",
password="b")

I am trying create a table and save into s3 bucket , any suggestions as to how to achieve that on Python ?

Asked By: Yogesh Kumar

||

Source

Answer 1

After creating the connection, you can just run the same UNLOAD query.

The way you execute SQL statements is by creating a cursor and running the ‘execute’ method (https://www.psycopg.org/docs/cursor.html?highlight=execute#cursor.execute):

sql = """UNLOAD ('select COL1,COL2,COL3
                from xyz 
                where user_name in (''ythm'')
                 and customer=''RANDOM'' 
                 and utc_date between ''2021-10-01'' and ''2022-01-21'' 
                                                       
           ')
           
           TO 's3://MYBUCKET/Industry_Raw_Data_'
           CREDENTIALS
           'aws_access_key_id=ABC;aws_secret_access_key=HYU'
           DELIMITER '|'
           ALLOWOVERWRITE
           PARALLEL OFF"""

cur = con.cursor()
cur.execute(sql)
con.commit()
con.close()

Answered By: Aleksandar Angelovski

How to unload a data table from AWS Redshift and save into s3 bucket using Python (example attached )?

Question:

R

Python

Answers: