How do cursors work in Python's DB-API?

Question:

I have been using python with RDBMS’ (MySQL and PostgreSQL), and I have noticed that I really do not understand how to use a cursor.

Usually, one have his script connect to the DB via a client DB-API (like psycopg2 or MySQLdb):

connection = psycopg2.connect(host='otherhost', etc)

And then one creates a cursor:

cursor = connection.cursor()

And then one can issue queries and commands:

cursor.execute("SELECT * FROM etc")

Now where is the result of the query, I wonder? is it on the server? or a little on my client and a little on my server? And then, if we need to access some results, we fetch ’em:

rows = cursor.fetchone() 

or

rows = cursor.fetchmany()

Now lets say, I do not retrieve all the rows, and decide to execute another query, what will happen to the previous results? Is their an overhead.

Also, should I create a cursor for every form of command and continuously reuse it for those same commands somehow; I head psycopg2 can somehow optimize commands that are executed many times but with different values, how and is it worth it?

Thx

Asked By: Nicholas Leonard

||

Answers:

Assuming you’re using PostgreSQL, the cursors probably are just implemented using the database’s native cursor API. You may want to look at the source code for pg8000, a pure Python PostgreSQL DB-API module, to see how it handles cursors. You might also like to look at the PostgreSQL documentation for cursors.

Answered By: kquinn

When you look here at the mysqldb documentation you can see that they implemented different strategies for cursors. So the general answer is: it depends.

Edit: Here is the mysqldb API documentation. There is some info how each cursor type is behaving. The standard cursor is storing the result set in the client. So I assume there is a overhead if you don’t retrieve all result rows, because even the rows you don’t fetch have to be transfered to the client (potentially over the network). My guess is that it is not that different from postgresql.

When you want to optimize SQL statements that you call repeatedly with many values, you should look at cursor.executemany(). It prepares a SQL statement so that it doesn’t need to be parsed every time you call it:

cur.executemany('INSERT INTO mytable (col1, col2) VALUES (%s, %s)',
                [('val1', 1), ('val2', 2)])
Answered By: unbeknown

ya, i know it’s months old 😛

DB-API’s cursor appears to be closely modeled after SQL cursors. AFA resource(rows) management is concerned, DB-API does not specify whether the client must retrieve all the rows or DECLARE an actual SQL cursor. As long as the fetchXXX interfaces do what they’re supposed to, DB-API is happy.

AFA psycopg2 cursors are concerned(as you may well know), “unnamed DB-API cursors” will fetch the entire result set–AFAIK buffered in memory by libpq. “named DB-API cursors”(a psycopg2 concept that may not be portable), will request the rows on demand(fetchXXX methods).

As cited by “unbeknown”, executemany can be used to optimize multiple runs of the same command. However, it doesn’t accommodate for the need of prepared statements; when repeat executions of a statement with different parameter sets is not directly sequential, executemany() will perform just as well as execute(). DB-API does “provide” driver authors with the ability to cache executed statements, but its implementation(what’s the scope/lifetime of the statement?) is undefined, so it’s impossible to set expectations across DB-API implementations.

If you are loading lots of data into PostgreSQL, I would strongly recommend trying to find a way to use COPY.

Answered By: jwp