How to filter objects for count annotation in Django?
Question:
Consider simple Django models Event
and Participant
:
class Event(models.Model):
title = models.CharField(max_length=100)
class Participant(models.Model):
event = models.ForeignKey(Event, db_index=True)
is_paid = models.BooleanField(default=False, db_index=True)
It’s easy to annotate events query with total number of participants:
events = Event.objects.all().annotate(participants=models.Count('participant'))
How to annotate with count of participants filtered by is_paid=True
?
I need to query all events regardless of number of participants, e.g. I don’t need to filter by annotated result. If there are 0
participants, that’s ok, I just need 0
in annotated value.
The example from documentation doesn’t work here, because it excludes objects from query instead of annotating them with 0
.
Update. Django 1.8 has new conditional expressions feature, so now we can do like this:
events = Event.objects.all().annotate(paid_participants=models.Sum(
models.Case(
models.When(participant__is_paid=True, then=1),
default=0,
output_field=models.IntegerField()
)))
Update 2. Django 2.0 has new Conditional aggregation feature, see the accepted answer below. This also works in Django 3.x
Answers:
UPDATE
The sub-query approach which I mention is now supported in Django 1.11 via subquery-expressions.
Event.objects.annotate(
num_paid_participants=Subquery(
Participant.objects.filter(
is_paid=True,
event=OuterRef('pk')
).values('event')
.annotate(cnt=Count('pk'))
.values('cnt'),
output_field=models.IntegerField()
)
)
I prefer this over aggregation (sum+case), because it should be faster and easier to be optimized (with proper indexing).
For older version, the same can be achieved using .extra
Event.objects.extra(select={'num_paid_participants': "
SELECT COUNT(*)
FROM `myapp_participant`
WHERE `myapp_participant`.`is_paid` = 1 AND
`myapp_participant`.`event_id` = `myapp_event`.`id`"
})
Just discovered that Django 1.8 has new conditional expressions feature, so now we can do like this:
events = Event.objects.all().annotate(paid_participants=models.Sum(
models.Case(
models.When(participant__is_paid=True, then=1),
default=0, output_field=models.IntegerField()
)))
I would suggest to use the .values
method of your Participant
queryset instead.
For short, what you want to do is given by:
Participant.objects
.filter(is_paid=True)
.values('event')
.distinct()
.annotate(models.Count('id'))
A complete example is as follow:
-
Create 2 Event
s:
event1 = Event.objects.create(title='event1')
event2 = Event.objects.create(title='event2')
-
Add Participant
s to them:
part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0))
for _ in range(10)]
part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0))
for _ in range(50)]
-
Group all Participant
s by their event
field:
Participant.objects.values('event')
> <QuerySet [{'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, '...(remaining elements truncated)...']>
Here distinct is needed:
Participant.objects.values('event').distinct()
> <QuerySet [{'event': 1}, {'event': 2}]>
What .values
and .distinct
are doing here is that they are creating two buckets of Participant
s grouped by their element event
. Note that those buckets contain Participant
.
-
You can then annotate those buckets as they contain the set of original Participant
. Here we want to count the number of Participant
, this is simply done by counting the id
s of the elements in those buckets (since those are Participant
):
Participant.objects
.values('event')
.distinct()
.annotate(models.Count('id'))
> <QuerySet [{'event': 1, 'id__count': 10}, {'event': 2, 'id__count': 50}]>
-
Finally you want only Participant
with a is_paid
being True
, you may just add a filter in front of the previous expression, and this yield the expression shown above:
Participant.objects
.filter(is_paid=True)
.values('event')
.distinct()
.annotate(models.Count('id'))
> <QuerySet [{'event': 1, 'id__count': 5}, {'event': 2, 'id__count': 25}]>
The only drawback is that you have to retrieve the Event
afterwards as you only have the id
from the method above.
Conditional aggregation in Django 2.0+ allows you to further reduce the amount of faff this has been in the past. This will also use Postgres’ filter
logic, which is somewhat faster than a sum-case (I’ve seen numbers like 20-30% bandied around).
Anyway, in your case, we’re looking at something as simple as:
from django.db.models import Q, Count
events = Event.objects.annotate(
paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)
There’s a separate section in the docs about filtering on annotations. It’s the same stuff as conditional aggregation but more like my example above. Either which way, this is a lot healthier than the gnarly subqueries I was doing before.
What result I am looking for:
- People (assignee) who have tasks added to a report. – Total Unique
count of People
- People who have tasks added to a report but, for task
whose billability is more than 0 only.
In general, I would have to use two different queries:
Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()
But I want both in one query. Hence:
Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))
Result:
<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>
For Django 3.x just write filter after the annotate:
User.objects.values('user_id')
.annotate(sudo_field=models.Count('likes'))
.filter(sudo_field__gt=100)
In above sudo_field
is not a model field in User Model and here we are filtering the users who have likes (or xyz) more than 100.
Consider simple Django models Event
and Participant
:
class Event(models.Model):
title = models.CharField(max_length=100)
class Participant(models.Model):
event = models.ForeignKey(Event, db_index=True)
is_paid = models.BooleanField(default=False, db_index=True)
It’s easy to annotate events query with total number of participants:
events = Event.objects.all().annotate(participants=models.Count('participant'))
How to annotate with count of participants filtered by is_paid=True
?
I need to query all events regardless of number of participants, e.g. I don’t need to filter by annotated result. If there are 0
participants, that’s ok, I just need 0
in annotated value.
The example from documentation doesn’t work here, because it excludes objects from query instead of annotating them with 0
.
Update. Django 1.8 has new conditional expressions feature, so now we can do like this:
events = Event.objects.all().annotate(paid_participants=models.Sum(
models.Case(
models.When(participant__is_paid=True, then=1),
default=0,
output_field=models.IntegerField()
)))
Update 2. Django 2.0 has new Conditional aggregation feature, see the accepted answer below. This also works in Django 3.x
UPDATE
The sub-query approach which I mention is now supported in Django 1.11 via subquery-expressions.
Event.objects.annotate(
num_paid_participants=Subquery(
Participant.objects.filter(
is_paid=True,
event=OuterRef('pk')
).values('event')
.annotate(cnt=Count('pk'))
.values('cnt'),
output_field=models.IntegerField()
)
)
I prefer this over aggregation (sum+case), because it should be faster and easier to be optimized (with proper indexing).
For older version, the same can be achieved using .extra
Event.objects.extra(select={'num_paid_participants': "
SELECT COUNT(*)
FROM `myapp_participant`
WHERE `myapp_participant`.`is_paid` = 1 AND
`myapp_participant`.`event_id` = `myapp_event`.`id`"
})
Just discovered that Django 1.8 has new conditional expressions feature, so now we can do like this:
events = Event.objects.all().annotate(paid_participants=models.Sum(
models.Case(
models.When(participant__is_paid=True, then=1),
default=0, output_field=models.IntegerField()
)))
I would suggest to use the .values
method of your Participant
queryset instead.
For short, what you want to do is given by:
Participant.objects
.filter(is_paid=True)
.values('event')
.distinct()
.annotate(models.Count('id'))
A complete example is as follow:
-
Create 2
Event
s:event1 = Event.objects.create(title='event1') event2 = Event.objects.create(title='event2')
-
Add
Participant
s to them:part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0)) for _ in range(10)] part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0)) for _ in range(50)]
-
Group all
Participant
s by theirevent
field:Participant.objects.values('event') > <QuerySet [{'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, '...(remaining elements truncated)...']>
Here distinct is needed:
Participant.objects.values('event').distinct() > <QuerySet [{'event': 1}, {'event': 2}]>
What
.values
and.distinct
are doing here is that they are creating two buckets ofParticipant
s grouped by their elementevent
. Note that those buckets containParticipant
. -
You can then annotate those buckets as they contain the set of original
Participant
. Here we want to count the number ofParticipant
, this is simply done by counting theid
s of the elements in those buckets (since those areParticipant
):Participant.objects .values('event') .distinct() .annotate(models.Count('id')) > <QuerySet [{'event': 1, 'id__count': 10}, {'event': 2, 'id__count': 50}]>
-
Finally you want only
Participant
with ais_paid
beingTrue
, you may just add a filter in front of the previous expression, and this yield the expression shown above:Participant.objects .filter(is_paid=True) .values('event') .distinct() .annotate(models.Count('id')) > <QuerySet [{'event': 1, 'id__count': 5}, {'event': 2, 'id__count': 25}]>
The only drawback is that you have to retrieve the Event
afterwards as you only have the id
from the method above.
Conditional aggregation in Django 2.0+ allows you to further reduce the amount of faff this has been in the past. This will also use Postgres’ filter
logic, which is somewhat faster than a sum-case (I’ve seen numbers like 20-30% bandied around).
Anyway, in your case, we’re looking at something as simple as:
from django.db.models import Q, Count
events = Event.objects.annotate(
paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)
There’s a separate section in the docs about filtering on annotations. It’s the same stuff as conditional aggregation but more like my example above. Either which way, this is a lot healthier than the gnarly subqueries I was doing before.
What result I am looking for:
- People (assignee) who have tasks added to a report. – Total Unique
count of People - People who have tasks added to a report but, for task
whose billability is more than 0 only.
In general, I would have to use two different queries:
Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()
But I want both in one query. Hence:
Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))
Result:
<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>
For Django 3.x just write filter after the annotate:
User.objects.values('user_id')
.annotate(sudo_field=models.Count('likes'))
.filter(sudo_field__gt=100)
In above sudo_field
is not a model field in User Model and here we are filtering the users who have likes (or xyz) more than 100.