psycopg2: update multiple rows with one query

Question:

I tried to update multiple rows (approx. 350000) with a single query by implementing the following function:

def update_items(rows_to_update):
    sql_query = """UPDATE contact as t SET
                    name = e.name
                    FROM (VALUES %s) AS e(id, name)
                    WHERE e.id = t.id;"""

    conn = get_db_connection()
    cur = conn.cursor()

    psycopg2.extras.execute_values (
    cur, sql_query, rows_to_update, template=None, page_size=100
    )

While trying to run the function above, only 31 records were updated. Then, I tried to update row by row with the following function:

def update_items_row_by_row(rows_to_update):
    sql_query = """UPDATE contact SET name = %s WHERE id = %s"""
    conn = get_db_connection()
    with tqdm(total=len(rows_to_update)) as pbar:
        for id, name in rows_to_update:
            cur = conn.cursor()
            # execute the UPDATE  statement
            cur.execute(sql_query, (name, id))
            # get the number of updated rows
            # Commit the changes to the database
            conn.commit()
            cur.close()
            pbar.update(1)

The latter has updated all the records so far but is very slow (estimated to end in 9 hours).
Does anyone know what is the efficient way to update multiple records?

Asked By: Niko Gamulin

||

Answers:

By splitting the list into chunks of size equal to page_size, it worked well:

def update_items(rows_to_update):
    sql_query = """UPDATE contact as t SET
                    name = data.name
                    FROM (VALUES %s) AS data (id, name)
                    WHERE t.id = data.id"""
    conn = get_db_connection()
    cur = conn.cursor()
    n = 100
    with tqdm(total=len(rows_to_update)) as pbar:
        for i in range(0, len(rows_to_update), n):
            psycopg2.extras.execute_values (
            cur, sql_query, rows_to_update[i:i + n], template=None, page_size=n
            )
            conn.commit()
            pbar.update(cur.rowcount)
    cur.close()
    conn.close()
Answered By: Niko Gamulin

The problem with your original function appears to be that you forgot to apply commit. When you execute an insert/update query with psycopg2 a transaction is opened but not finalized until commit is called. See my edits in your function (towards the bottom).

def update_items(rows_to_update):
    sql_query = """UPDATE contact as t SET
                name = e.name
                FROM (VALUES %s) AS e(id, name)
                WHERE e.id = t.id;"""

    conn = get_db_connection()
    cur = conn.cursor()

    psycopg2.extras.execute_values(cur, sql_query, rows_to_update)

    ## solution below ## 
    conn.commit() # <- We MUST commit to reflect the inserted data
    cur.close()
    conn.close()
    return "success :)"

If you don’t want to call conn.commit() each time you create a new cursor, you can use autocommit such as

conn = get_db_connection()
conn.set_session(autocommit=True)
Answered By: Jon