Redshift python connector columns names are byte string

Question:

Suppose I have the following table in redshift:

a | b
-----
1 | 2
3 | 4

If I want to extract it from Redshift to a pd.DataFrame I can do the following:

import redshift_connector
import pandas as pd

query = 'SELECT * FROM table'
conn = redshift_connector(user=user, host=host, password=password, port=port, database=database)

df = pd.read_sql_query(query, conn)

I’m using the following package redshift_connector. But the problem is that the name of the columns in df are byte-strings:

df['a']

This would return an error, since the name of the column is b'a'. Does anyone know any workaround for this? I already have written code using psycopg2 which uses normal strings, and thus would like have a solution that doesn’t change too much of the code.

Edit:

Versions

Python = 3.9.7

Redshift-connector = 2.0.889

Pandas = 1.2.5

Asked By: Bruno Mello

||

Answers:

You could just fix this with one line

df.columns = [col.decode("utf-8") for col in df.columns]

Or instead of using pd.read_sql_query use the connection approach suggested in the documentation

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("SELECT * FROM table")

result: pd.DataFrame = cursor.fetch_dataframe()
Answered By: Pavel Slepiankou

This was fixed in v2.0.908 of redshift-connector

Answered By: brooke-white
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.