PYTHON: Is there a way to insert a JSON in a SUPER column in Redshift without escaping characters?

Question:

I’m trying to save data returned from an API as JSON in a SUPER type column in a table on Redshift.

My request:

data = requests.get(url=f'https://economia.awesomeapi.com.br/json/last/USD-BRL'

I have a function to insert the data that runs like this:

    QUERY = f"""INSERT INTO {schema}.{table}(data) VALUES (%s)"""

    conn = self.__get_conn()
    cursor = conn.cursor()

    print('INSERT DATA')
    cursor.execute(query=QUERY, vars=([data, ]))
    conn.commit()

When try to insert the data using data.json() I get the following error: psycopg2.ProgrammingError: can't adapt type 'dict'. So I used json.dumps(data.json()) to serialize and input it.

But when I look at the database the data has escape characters like this: "{"code": "USD", "codein": "BRL", "name": "Dólar Americano/Real Brasileiro", "high": "5.2768", "low": "5.1848", "varBid": "-0.0264", "pctChange": "-0.5", "bid":

I want to use DBT to structure this dataset using JSON_PARSE() and CTEs on Redshift, but these escaping characters are in the way.

What am I missing? Is there a different way to do it?

The table DDL:

CREATE TABLE IF NOT EXISTS public.raw_currency
(
    id BIGINT  DEFAULT "identity"(105800, 0, '1,1'::text) ENCODE az64
    ,"data" SUPER   ENCODE zstd
    ,stored_at TIMESTAMP WITHOUT TIME ZONE   ENCODE az64
    ,error_log VARCHAR(65535)   ENCODE lzo
)
Asked By: Luis Felipe

||

Answers:

INSERT is an anti-pattern with Redshift; basically, inserts are very slow. And in this case, it’s the conversion from JSON to string and back that’s getting you.

You want to use COPY when bulk-inserting data. Redshift’s COPY takes a format parameter, and JSON is one of many supported formats. Docs.

Note that to be copied, the data needs to be in S3, and your Redshift cluster needs access to the key in S3 where you are copying from. Lots of tutorials online for this.

Answered By: tconbeer

But when I look at the database the data has escape characters like this

that’s because the data is being stored as a string, not as structured data. the string happens to be in valid JSON format, but it’s still just a string.

to store it as structured data, you have to use JSON_PARSE(), e.g. do something like this:

dic = {'some': 'data with "quotes" that might trip up other methods'}

QUERY = f"""INSERT INTO {schema}.{table} (data) VALUES (JSON_PARSE(%s))"""

cursor.execute(query=QUERY, vars=([json.dumps(dic), ]))
Answered By: mziwisky