Redshift python connector columns names are byte string
Question:
Suppose I have the following table in redshift:
a | b
-----
1 | 2
3 | 4
If I want to extract it from Redshift to a pd.DataFrame
I can do the following:
import redshift_connector
import pandas as pd
query = 'SELECT * FROM table'
conn = redshift_connector(user=user, host=host, password=password, port=port, database=database)
df = pd.read_sql_query(query, conn)
I’m using the following package redshift_connector. But the problem is that the name of the columns in df
are byte-strings:
df['a']
This would return an error, since the name of the column is b'a'
. Does anyone know any workaround for this? I already have written code using psycopg2
which uses normal strings, and thus would like have a solution that doesn’t change too much of the code.
Edit:
Versions
Python = 3.9.7
Redshift-connector = 2.0.889
Pandas = 1.2.5
Answers:
You could just fix this with one line
df.columns = [col.decode("utf-8") for col in df.columns]
Or instead of using pd.read_sql_query
use the connection approach suggested in the documentation
cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("SELECT * FROM table")
result: pd.DataFrame = cursor.fetch_dataframe()
This was fixed in v2.0.908 of redshift-connector
Suppose I have the following table in redshift:
a | b
-----
1 | 2
3 | 4
If I want to extract it from Redshift to a pd.DataFrame
I can do the following:
import redshift_connector
import pandas as pd
query = 'SELECT * FROM table'
conn = redshift_connector(user=user, host=host, password=password, port=port, database=database)
df = pd.read_sql_query(query, conn)
I’m using the following package redshift_connector. But the problem is that the name of the columns in df
are byte-strings:
df['a']
This would return an error, since the name of the column is b'a'
. Does anyone know any workaround for this? I already have written code using psycopg2
which uses normal strings, and thus would like have a solution that doesn’t change too much of the code.
Edit:
Versions
Python = 3.9.7
Redshift-connector = 2.0.889
Pandas = 1.2.5
You could just fix this with one line
df.columns = [col.decode("utf-8") for col in df.columns]
Or instead of using pd.read_sql_query
use the connection approach suggested in the documentation
cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("SELECT * FROM table")
result: pd.DataFrame = cursor.fetch_dataframe()
This was fixed in v2.0.908 of redshift-connector