How to make Django QuerySet bulk delete() more efficient

Question:

Setup:
Django 1.1.2, MySQL 5.1

Problem:

Blob.objects.filter(foo = foo) 
            .filter(status = Blob.PLEASE_DELETE) 
            .delete()

This snippet results in the ORM first generating a SELECT * from xxx_blob where ... query, then doing a DELETE from xxx_blob where id in (BLAH); where BLAH is a ridiculously long list of id’s. Since I’m deleting a large amount of blobs, this makes both me and the DB very unhappy.

Is there a reason for this? I don’t see why the ORM can’t convert the above snippet into a single DELETE query. Is there a way to optimize this without resorting to raw SQL?

Asked By: svintus

||

Answers:

Not without writing your own custom SQL or managers or something; they are apparently working on it though.

http://code.djangoproject.com/ticket/9519

Answered By: Dominic Santos

Bulk delete is already part of django

Keep in mind that this will, whenever possible, be executed purely in SQL

Answered By: David

For those who are still looking for an efficient way to bulk delete in django, here’s a possible solution:

The reason delete() may be so slow is twofold: 1) Django has to ensure cascade deleting functions properly, thus looking for foreign key references to your models; 2) Django has to handle pre and post-save signals for your models.

If you know your models don’t have cascade deleting or signals to be handled, you can accelerate this process by resorting to the private API _raw_delete as follows:

queryset._raw_delete(queryset.db)

More details in here. Please note that Django already tries to make a good handling of these events, though using the raw delete is, in many situations, much more efficient.

Answered By: Anoyz
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.