Pandas .to_sql() Output exceeds the size limit

Question:

I’m using pydobc and sqlalchemy to insert data into a table in SQL Server, and I’m getting this error.

https://i.stack.imgur.com/miSp9.png

Here are snippets of the functions I use.

This is the function I use to connect to the SQL server (using fast_executemany)

def connect(server, database):
    global cnxn_str, cnxn, cur, quoted, engine
    cnxn_str = ("Driver={SQL Server Native Client 11.0};"
                "Server=<server>;"
                "Database=<database>;"
                "UID=<user>;"
                "PWD=<password>;")
    cnxn = pyodbc.connect(cnxn_str)
    cur = cnxn.cursor()
    cur.fast_executemany=True

    quoted = quote_plus(cnxn_str)
    engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted), fast_executemany=True)

And this is the function I’m using to query and insert the data into the SQL server

def insert_to_sql_server():
    global df, np_array
    
    # Dataframe df is from a numpy array dtype = object
    df = pd.DataFrame(np_array[1:,],columns=np_array[0])
    
    # add new columns, data processing
    df['comp_key'] = df['col1']+"-"+df['col2'].astype(str)
    df['comp_key2'] = df['col3']+"-"+df['col4'].astype(str)+"-"+df['col5'].astype(str)
    df['comp_statusID'] = df['col6']+"-"+df['col7'].astype(str)

    convert_dict = {'col1': 'string', 'col2': 'string', ..., 'col_n': 'string'}
    
    # convert data types of columns from objects to strings
    df = df.astype(convert_dict)

    connect(<server>, <database>)
    cur.rollback()

    # Delete old records
    cur.execute("DELETE FROM <table>")

    cur.commit()

    # Insert dataframe to table
    df.to_sql(<table name>, engine, index=False, 
        if_exists='replace', schema='dbo', chunksize=1000, method='multi')

The insert function runs for about 30 minutes before finally returning the error message.

I encountered no errors when doing it with a smaller df size. The current df size I have is 27963 rows and 9 columns. One thing which I think contributes to the error is the length of the string. By default the numpy array is dtype='<U25′, but I had to override this to dtype=’object’ because it was truncating the text data.

I’m out of ideas because it seems like the error is referring to limitations of either Pandas or the SQL Server, which I’m not familiar with.

Thanks

Asked By: user19498404

||

Answers:

Thanks for all the input (still new here)! Accidentally stumbled upon the solution, which is by reducing the df.to_sql from

df.to_sql(chunksize=1000)

to

df.to_sql(chunksize=200)

After digging it turns out there’s a limitation from SQL server (https://discuss.dizzycoding.com/to_sql-pyodbc-count-field-incorrect-or-syntax-error/)

Answered By: user19498404

In my case, I had the same "Output exceeds the size limit" error, and I fixed it adding "method=’multi’" in df.to_sql(method=’multi’).
First I tried the "chuncksize" solution and it didn’t work.
So… check that if you’re at the same scenario!

with engine.connect().execution_options(autocommit=True) as conn:
    df.to_sql('mytable', con=conn, method='multi', if_exists='replace', index=True)
Answered By: santiago-giordano
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.