Python pandas to_sql populates all rows with NA
Question:
I am new to Python and want to upload a Panda’s data frame into a table on Snowflake database. The desired behavior is to replace an existing table if it already exists. Here’s my code to do so:
#Import the required modules and packages
from snowflake.connector.pandas_tools import pandas
from snowflake.connector.pandas_tools import pd_writer
from snowflake.sqlalchemy import URL
import sqlalchemy
from sqlalchemy import create_engine
import os
#import the data from a local csv
my_data_frame= pandas.read_csv("my_data_frame.csv" , sep=',' , low_memory=False)
#Connect to Snowflake
engine = create_engine(URL(
account = 'my_snowflake_account',
user = 'my_snowflake_id',
password = 'my_snowflake_password',
database = 'my_database',
schema = 'my_schema',
warehouse = 'my_warehouse',
role='my_role',
))
connection = engine.connect()
#Push my local data frame into a new table
my_data_frame.to_sql('new_table_name_on_Snowflake', engine,
index=False, method=pd_writer, if_exists='replace')
The code runs. It creates a table and assignes the correct table names in it. However all rows for all columns are populated with NA. I suspect it has to do with data types. How may I resolve this?
Please note that I specify the method=pd_writer
as suggested in Snowflakes’ documentation:
https://docs.snowflake.com/en/user-guide/python-connector-api.html#pd_writer
When inspecting the my_data_fame with my_data_frame.dtypes
it returns:
column1 object
column2 int64
column3 object
column4 object
column5 object
column6 object
column7 int64
column8 int64
Answers:
For no rational reason, upper casing the column names solves the problem. So Insert the following before calling to_sql
my_data_frame.columns = map(str.upper, my_data_frame.columns)
I am new to Python and want to upload a Panda’s data frame into a table on Snowflake database. The desired behavior is to replace an existing table if it already exists. Here’s my code to do so:
#Import the required modules and packages
from snowflake.connector.pandas_tools import pandas
from snowflake.connector.pandas_tools import pd_writer
from snowflake.sqlalchemy import URL
import sqlalchemy
from sqlalchemy import create_engine
import os
#import the data from a local csv
my_data_frame= pandas.read_csv("my_data_frame.csv" , sep=',' , low_memory=False)
#Connect to Snowflake
engine = create_engine(URL(
account = 'my_snowflake_account',
user = 'my_snowflake_id',
password = 'my_snowflake_password',
database = 'my_database',
schema = 'my_schema',
warehouse = 'my_warehouse',
role='my_role',
))
connection = engine.connect()
#Push my local data frame into a new table
my_data_frame.to_sql('new_table_name_on_Snowflake', engine,
index=False, method=pd_writer, if_exists='replace')
The code runs. It creates a table and assignes the correct table names in it. However all rows for all columns are populated with NA. I suspect it has to do with data types. How may I resolve this?
Please note that I specify the method=pd_writer
as suggested in Snowflakes’ documentation:
https://docs.snowflake.com/en/user-guide/python-connector-api.html#pd_writer
When inspecting the my_data_fame with my_data_frame.dtypes
it returns:
column1 object
column2 int64
column3 object
column4 object
column5 object
column6 object
column7 int64
column8 int64
For no rational reason, upper casing the column names solves the problem. So Insert the following before calling to_sql
my_data_frame.columns = map(str.upper, my_data_frame.columns)