Python does cursor execute load all data

Question:

I am trying to query a large data (10 million rows) and try to prevent out of memory, but not familiar with Python and confused with different opinions regarding the execute(), cursor iterator and fetchone()

Am I right to assume that cursor.execute() does not load all data into memory and only when I call fetchone() then it will load 1 row of data

from mysql.connector import MySQLConnection


def query():
    conn = MySQLConnection(host=conf['host'],
                                conf['port'],
                                conf['user'],
                                conf['password'],
                                conf['database'])
    cursor = conn.cursor(buffered=True)
    cursor.execute('SELECT * FROM TABLE') # 10 million rows

does this cursor iterator does the same with fetchone() ?

for row in cursor:
    print(row)

is my code snippet is safe to handle 10 million rows of data? if not, how can I safely iterate the data without out of memory?

Asked By: Timothy

||

Answers:

Taken from MySQL documentation:

The fetchone() method is used by fetchall() and fetchmany(). It is also used when a cursor is used as an iterator.

The following example shows two equivalent ways to process a query result. The first uses fetchone() in a while loop, the second uses the cursor as an iterator:

# Using a while loop
cursor.execute("SELECT * FROM employees")
row = cursor.fetchone()
while row is not None:
  print(row)
  row = cursor.fetchone()

# Using the cursor as iterator
cursor.execute("SELECT * FROM employees")
for row in cursor:
  print(row)

It also stated that:

You must fetch all rows for the current query before executing new statements using the same connection.

If you are worried about performance issues you should use fetchmany(n) in a while loop until you fetch all of the results like so:

'An iterator that uses fetchmany to keep memory usage down'
    while True:
        results = cursor.fetchmany(arraysize)
        if not results:
            break
        for result in results:
            yield result

This behavior adheres to PEP249, which describes how and which methods db connectors should implement. A partial answer is given in this thread.

Basically the implementation of fetchall vs fetchmany vs fetchone would be up to the developers of the library depending on the database capabilities, but it would make sense, in the case of fetchmany and fetchone, that the unfetched/remaining results would be kept server side, until requested by another call or destruction of cursor object.

So in conclusion I think it is safe to assume calling execute method does not, in this case(mysqldb), dump all the data from the query to memory.

Answered By: Milos Grujic

My first suggestion is to use from mysql.connector import connect, which by the default will use the C extension (CMySQLConnection), instead of from mysql.connector import MySQLConnection (pure Python).

If you for some reason want to use the pure Python version, you can pass use_pure=True in connect()

The second suggestion is to paginate the results, if you use a buffered cursor, it will fetch the entire result set from the server. I don’t know if you want that with 10M rows.

Here’s some references:

https://dev.mysql.com/doc/refman/8.0/en/limit-optimization.html

https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorbuffered.html

Answered By: Nuno Mariz