Removing objects from a queryset by ID optimal implementation
Question:
I have a queryset
of 1000000
objects MyModel
.
All objects in queryset
have an ID
.
I pass a list of ids to remove, for example:
ids = ['1', '23', '117', ...] # len = 100000
Then, i need to delete objects in queryset
with ids
.
List of ids
can contains non-existent ids.
What is the best way to do it?
My variant:
for id in ids:
obj = MyModel.objects.filter(pk=id)
if obj:
obj.delete()
I’m not sure about it, because it will make a 100000
queries to the database, maybe it makes sense to convert the queryset
to a list
, then filter it by id
?
On the other hand, if there are a million objects in the database, and only one needs to be deleted, this will create an inverse relationship.
Answers:
The current approach will query the database 1000000
times, you can use __in
lookup and do bulk deletion by applying delete()
just after filter()
:
MyModel.objects.filter(pk__in=ids).delete()
Now the database would be affected only one time.
Edit:
To filter all the valid IDs in optimal way do intersection in the following way:
all_existing_ids = set(MyModel.objects.values_list('pk', flat=True))
all_valid_ids = set(ids) & existing_ids
I have a queryset
of 1000000
objects MyModel
.
All objects in queryset
have an ID
.
I pass a list of ids to remove, for example:
ids = ['1', '23', '117', ...] # len = 100000
Then, i need to delete objects in queryset
with ids
.
List of ids
can contains non-existent ids.
What is the best way to do it?
My variant:
for id in ids:
obj = MyModel.objects.filter(pk=id)
if obj:
obj.delete()
I’m not sure about it, because it will make a 100000
queries to the database, maybe it makes sense to convert the queryset
to a list
, then filter it by id
?
On the other hand, if there are a million objects in the database, and only one needs to be deleted, this will create an inverse relationship.
The current approach will query the database 1000000
times, you can use __in
lookup and do bulk deletion by applying delete()
just after filter()
:
MyModel.objects.filter(pk__in=ids).delete()
Now the database would be affected only one time.
Edit:
To filter all the valid IDs in optimal way do intersection in the following way:
all_existing_ids = set(MyModel.objects.values_list('pk', flat=True))
all_valid_ids = set(ids) & existing_ids