Aggregating save()s in Django?

Question:

I’m using Django with an sqlite backend, and write performance is a problem. I may graduate to a “proper” db at some stage, but for the moment I’m stuck with sqlite. I think that my write performance problems are probably related to the fact that I’m creating a large number of rows, and presumably each time I save() one it’s locking, unlocking and syncing the DB on disk.

How can I aggregate a large number of save() calls into a single database operation?

Asked By: kdt

||

Answers:

“How can I aggregate a large number of save() calls into a single database operation?”

You don’t need to. Django already manages a cache for you. You can’t improve it’s DB caching by trying to fuss around with saves.

“write performance problems are probably related to the fact that I’m creating a large number of rows”

Correct.

SQLite is pretty slow. That’s the way it is. Queries are faster than most other DB’s. Writes are pretty slow.

Consider more serious architecture change. Are you loading rows during a web transaction (i.e., bulk uploading files and loading the DB from those files)?

If you’re doing bulk loading inside a web transaction, stop. You need to do something smarter. Use celery or use some other “batch” facility to do your loads in the background.

We try to limit ourself to file validation in a web transaction and do the loads when the user’s not waiting for their page of HTML.

Answered By: S.Lott

EDITED: commit_on_success is deprecated and was removed in Django 1.8. Use transaction.atomic instead. See Fraser Harris’s answer.

Actually this is easier to do than you think. You can use transactions in Django. These batch database operations (specifically save, insert and delete) into one operation. I’ve found the easiest one to use is commit_on_success. Essentially you wrap your database save operations into a function and then use the commit_on_success decorator.

from django.db.transaction import commit_on_success

@commit_on_success
def lot_of_saves(queryset):
    for item in queryset:
        modify_item(item)
        item.save()

This will have a huge speed increase. You’ll also get the benefit of having roll-backs if any of the items fail. If you have millions of save operations then you may have to commit them in blocks using the commit_manually and transaction.commit() but I’ve rarely needed that.

Answered By: JudoWill

New as of Django 1.6 is atomic, a simple API to control DB transactions. Copied verbatim from the docs:

atomic is usable both as a decorator:

from django.db import transaction

@transaction.atomic
def viewfunc(request):
    # This code executes inside a transaction.
    do_stuff()

and as a context manager:

from django.db import transaction

def viewfunc(request):
    # This code executes in autocommit mode (Django's default).
    do_stuff()

    with transaction.atomic():
        # This code executes inside a transaction.
        do_more_stuff()

Legacy django.db.transaction functions autocommit(), commit_on_success(), and commit_manually() have been deprecated and will be remove in Django 1.8.

Answered By: Fraser Harris

I think this is the method you are looking for: https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create

Code copied from the docs:

Entry.objects.bulk_create([
    Entry(headline='This is a test'),
    Entry(headline='This is only a test'),
])

Which in practice, would look like:

my_entries = list()
for i in range(100):
    my_entries.append(Entry(headline='Headline #'+str(i))

Entry.objects.bulk_create(my_entries)

According to the docs, this executes a single query, regardless of the size of the list (maximum 999 items on SQLite3), which can’t be said for the atomic decorator.

There is an important distinction to make. It sounds like, from the OP’s question, that he is attempted to bulk create rather than bulk save. The atomic decorator is the fastest solution for saving, but not for creating.

Answered By: Chris Conlan
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.