How to make Django QuerySet bulk delete() more efficient
Question:
Setup:
Django 1.1.2, MySQL 5.1
Problem:
Blob.objects.filter(foo = foo)
.filter(status = Blob.PLEASE_DELETE)
.delete()
This snippet results in the ORM first generating a SELECT * from xxx_blob where ...
query, then doing a DELETE from xxx_blob where id in (BLAH);
where BLAH is a ridiculously long list of id’s. Since I’m deleting a large amount of blobs, this makes both me and the DB very unhappy.
Is there a reason for this? I don’t see why the ORM can’t convert the above snippet into a single DELETE query. Is there a way to optimize this without resorting to raw SQL?
Answers:
Not without writing your own custom SQL or managers or something; they are apparently working on it though.
Bulk delete is already part of django
Keep in mind that this will, whenever possible, be executed purely in SQL
For those who are still looking for an efficient way to bulk delete in django, here’s a possible solution:
The reason delete()
may be so slow is twofold: 1) Django has to ensure cascade deleting functions properly, thus looking for foreign key references to your models; 2) Django has to handle pre and post-save signals for your models.
If you know your models don’t have cascade deleting or signals to be handled, you can accelerate this process by resorting to the private API _raw_delete
as follows:
queryset._raw_delete(queryset.db)
More details in here. Please note that Django already tries to make a good handling of these events, though using the raw delete is, in many situations, much more efficient.
Setup:
Django 1.1.2, MySQL 5.1
Problem:
Blob.objects.filter(foo = foo)
.filter(status = Blob.PLEASE_DELETE)
.delete()
This snippet results in the ORM first generating a SELECT * from xxx_blob where ...
query, then doing a DELETE from xxx_blob where id in (BLAH);
where BLAH is a ridiculously long list of id’s. Since I’m deleting a large amount of blobs, this makes both me and the DB very unhappy.
Is there a reason for this? I don’t see why the ORM can’t convert the above snippet into a single DELETE query. Is there a way to optimize this without resorting to raw SQL?
Not without writing your own custom SQL or managers or something; they are apparently working on it though.
Bulk delete is already part of django
Keep in mind that this will, whenever possible, be executed purely in SQL
For those who are still looking for an efficient way to bulk delete in django, here’s a possible solution:
The reason delete()
may be so slow is twofold: 1) Django has to ensure cascade deleting functions properly, thus looking for foreign key references to your models; 2) Django has to handle pre and post-save signals for your models.
If you know your models don’t have cascade deleting or signals to be handled, you can accelerate this process by resorting to the private API _raw_delete
as follows:
queryset._raw_delete(queryset.db)
More details in here. Please note that Django already tries to make a good handling of these events, though using the raw delete is, in many situations, much more efficient.