Django Left Outer Join
Question:
I have a website where users can see a list of movies, and create reviews for them.
The user should be able to see the list of all the movies. Additionally, IF they have reviewed the movie, they should be able to see the score that they gave it. If not, the movie is just displayed without the score.
They do not care at all about the scores provided by other users.
Consider the following models.py
from django.contrib.auth.models import User
from django.db import models
class Topic(models.Model):
name = models.TextField()
def __str__(self):
return self.name
class Record(models.Model):
user = models.ForeignKey(User)
topic = models.ForeignKey(Topic)
value = models.TextField()
class Meta:
unique_together = ("user", "topic")
What I essentially want is this
select * from bar_topic
left join (select topic_id as tid, value from bar_record where user_id = 1)
on tid = bar_topic.id
Consider the following test.py
for context:
from django.test import TestCase
from bar.models import *
from django.db.models import Q
class TestSuite(TestCase):
def setUp(self):
t1 = Topic.objects.create(name="A")
t2 = Topic.objects.create(name="B")
t3 = Topic.objects.create(name="C")
# 2 for Johnny
johnny = User.objects.create(username="Johnny")
johnny.record_set.create(topic=t1, value=1)
johnny.record_set.create(topic=t3, value=3)
# 3 for Mary
mary = User.objects.create(username="Mary")
mary.record_set.create(topic=t1, value=4)
mary.record_set.create(topic=t2, value=5)
mary.record_set.create(topic=t3, value=6)
def test_raw(self):
print('nrawn---')
with self.assertNumQueries(1):
topics = Topic.objects.raw('''
select * from bar_topic
left join (select topic_id as tid, value from bar_record where user_id = 1)
on tid = bar_topic.id
''')
for topic in topics:
print(topic, topic.value)
def test_orm(self):
print('normn---')
with self.assertNumQueries(1):
topics = Topic.objects.filter(Q(record__user_id=1)).values_list('name', 'record__value')
for topic in topics:
print(*topic)
BOTH tests should print the exact same output, however, only the raw version spits out the correct table of results:
raw
---
A 1
B None
C 3
the orm instead returns this
orm
---
A 1
C 3
Any attempt to join back the rest of the topics, those that have no reviews from user “johnny”, result in the following:
orm
---
A 1
A 4
B 5
C 3
C 6
How can I accomplish the simple behavior of the raw query with the Django ORM?
edit: This sort of works but seems very poor:
topics = Topic.objects.filter(record__user_id=1).values_list('name', 'record__value')
noned = Topic.objects.exclude(record__user_id=1).values_list('name')
for topic in chain(topics, noned):
...
edit: This works a little bit better, but still bad:
topics = Topic.objects.filter(record__user_id=1).annotate(value=F('record__value'))
topics |= Topic.objects.exclude(pk__in=topics)
orm
---
A 1
B 5
C 3
Answers:
First of all, there is no a way (atm Django 1.9.7) to have a representation with Django’s ORM of the raw query you posted, exactly as you want; however, you can get the same desired result with something like:
>>> Topic.objects.annotate(
f=Case(
When(
record__user=johnny,
then=F('record__value')
),
output_field=IntegerField()
)
).order_by(
'id', 'name', 'f'
).distinct(
'id', 'name'
).values_list(
'name', 'f'
)
>>> [(u'A', 1), (u'B', None), (u'C', 3)]
>>> Topic.objects.annotate(f=Case(When(record__user=may, then=F('record__value')), output_field=IntegerField())).order_by('id', 'name', 'f').distinct('id', 'name').values_list('name', 'f')
>>> [(u'A', 4), (u'B', 5), (u'C', 6)]
Here the SQL generated for the first query:
>>> print Topic.objects.annotate(f=Case(When(record__user=johnny, then=F('record__value')), output_field=IntegerField())).order_by('id', 'name', 'f').distinct('id', 'name').values_list('name', 'f').query
>>> SELECT DISTINCT ON ("payments_topic"."id", "payments_topic"."name") "payments_topic"."name", CASE WHEN "payments_record"."user_id" = 1 THEN "payments_record"."value" ELSE NULL END AS "f" FROM "payments_topic" LEFT OUTER JOIN "payments_record" ON ("payments_topic"."id" = "payments_record"."topic_id") ORDER BY "payments_topic"."id" ASC, "payments_topic"."name" ASC, "f" ASC
##Some notes
- Doesn’t hesitate to use raw queries, specially when the performance is the most important thing. Moreover, sometimes it is a must since you can’t get the same result using Django’s ORM; in other cases you can, but once in a while having clean and understandable code is more important than the performance in this piece of code.
distinct
with positional arguments is used in this answer, which is available for PostgreSQL only, atm. In the docs you can see more about conditional expressions.
Raw Queries.
topics = Topic.objects.raw('''
select * from bar_topic
left join (select topic_id as tid, value from bar_record where user_id = 1) AS subq
on tid = bar_topic.id
''')
You seem to know the answer yourself. There is nothing wrong with using a raw query when you can’t get the ORM query to behave exactly the way you want it to.
One main drawback with raw queries is that they are not cached like ORM queries. That means if you iterate through the raw queryset twice, the query will be repeated. Another is that you cannot call .count() on it.
Null Foreign Keys
You can force the ORM to use LEFT OUTER JOIN BY setting null=True
in the foreign keys. Do this with the tables as they are.
print Record.objects.filter(user_id=8).select_related('topic').query
The result is
SELECT "bar_record"."id", "bar_record"."user_id", "bar_record"."topic_id", "bar_record"."value", "bar_topic"."id", "bar_topic"."name" FROM "bar_record"
INNER JOIN "bar_topic" ON ( "bar_record"."topic_id" = "bar_topic"."id" ) WHERE "bar_record"."user_id" = 8
Now set, null=True and perform the same ORM query as above. The result is
SELECT "bar_record"."id", "bar_record"."user_id", "bar_record"."topic_id", "bar_record"."value", "bar_topic"."id", "bar_topic"."name" FROM "bar_record"
LEFT OUTER JOIN "bar_topic" ON ( "bar_record"."topic_id" = "bar_topic"."id" ) WHERE "bar_record"."user_id" = 8
Note how the query has suddenly changed to LEFT OUTER JOIN
. But we are not out of the woods yet because the order of tables should be reversed! Thus unless you can restructure your models, an ORM LEFT OUTER JOIN may not be entirely possible without chaining or UNION both of which you have tried out already.
This is how I would do it. Two queries, not one:
class Topic(models.Model):
#...
@property
def user_value(self):
try:
return self.user_records[0].value
except IndexError:
#This topic does not have
#a review by the request.user
return None
except AttributeError:
raise AttributeError('You forgot to prefetch the user_records')
#or you can just
return None
#usage
topics = Topic.objects.all().prefetch_related(
models.Prefetch('record_set',
queryset=Record.objects.filter(user=request.user),
to_attr='user_records'
)
)
for topic in topics:
print topic.user_value
The benefit is that you get the whole Record
object. So consider a situation where you want to not only show the value
, but the time-stamp
also.
Just for the record I want to show one more solution using .extra
. I’m impressed no one has mention it, because it should produce the best possible performance.
topics = Topic.objects.all().extra(
select={
'user_value': """SELECT value FROM myapp_record
WHERE myapp_record.user_id = %s
AND myapp_record.topic_id = myapp_topic.id
"""
},
select_params=(request.user.id,)
)
for topic in topics
print topic.user_value
Both solutions can be abstracted into a custom TopicQuerySet
class for re-usability.
class TopicQuerySet(models.QuerySet):
def prefetch_user_records(self, user):
return self.prefetch_related(
models.Prefetch('record_set',
queryset=Record.objects.filter(user=request.user),
to_attr='user_records'
)
)
def annotate_user_value(self, user):
return self.extra(
select={
'user_value': """SELECT value FROM myapp_record
WHERE myapp_record.user_id = %s
AND myapp_record.topic_id = myapp_topic.id
"""
},
select_params=(user.id,)
)
class Topic(models.Model):
#...
objects = TopicQuerySet.as_manager()
#usage
topics = Topic.objects.all().annotate_user_value(request.user)
#or
topics = Topic.objects.all().prefetch_user_records(request.user)
for topic in topics:
print topic.user_value
This more universal solution inspired by trinchet’s answer works also with other databases:
>>> qs = Topic.objects.annotate(
... f=Max(Case(When(record__user=johnny, then=F('record__value'))))
... )
example data
>>> print(qs.values_list('name', 'f'))
[(u'A', 1), (u'B', None), (u'C', 3)]
verify the query
>>> print(qs.query) # formated and removed excessive double quotes
SELECT bar_topic.id, bar_topic.name,
MAX(CASE WHEN bar_record.user_id = 1 THEN bar_record.value ELSE NULL END) AS f
FROM bar_topic LEFT OUTER JOIN bar_record ON (bar_topic.id = bar_record.topic_id)
GROUP BY bar_topic.id, bar_topic.name
Advantages (compared to the original solutions)
- It works also with SQLite.
- The query set can be easily filtered or sorted, no matter how.
- No type cast
output_field
is necessary.
- The methods
values
or values_list(*field_names)
are useful for a more simple GROUP BY
, but they are not necessary.
The left join can be made more readable by writing a function:
from django.db.models import Max, Case, When, F
def left_join(result_field, **lookups):
return Max(Case(When(then=F(result_field), **lookups)))
>>> Topic.objects.annotate(
... record_value=left_join('record__value', record__user=johnny),
... ).values_list('name', 'record_value')
More fields from Record can be added by anotate
method to results this way with nice mnemonic names.
I agree with other authors that it can be optimized, but readability counts.
EDIT: The same result comes if the aggregation function Max
is replaced by Min
. Both Min and Max ignore NULL values and can be used on any type, e.g. for strings. The aggregation is useful if the left join is not guaranteed to be unique. If the field is numeric than it can be useful to use average value Avg
on the left join.
What I essentially want is this
select * from bar_topic
left join (select topic_id as tid, value from bar_record where user_id = 1)
on tid = bar_topic.id
…or, perhaps this equivalent which avoids a subquery…
select * from bar_topic
left join bar_record
on bar_record.topic_id = bar_topic.id and bar_record.user_id = 1
I want to know how to do this effectively, or, if it is impossible, an explanation of why it is impossible…
Unless you use raw queries, it’s impossible with Django’s ORM, and here’s why.
QuerySet
objects (django.db.models.query.QuerySet
) have a query
attribute (django.db.models.sql.query.Query
) which is a representation of the actual query which will be performed. These Query
objects helpfully have a __str__
method, so you can print it out to see what it is.
Let’s start with a simple QuerySet
…
>>> from bar.models import *
>>> qs = Topic.objects.filter(record__user_id=1)
>>> print qs.query
SELECT "bar_topic"."id", "bar_topic"."name" FROM "bar_topic" INNER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1
…which is obviously not going to work, due to the INNER JOIN
.
Taking a deeper look inside the Query
object, there’s an alias_map
attribute which determines what table joins will be performed…
>>> from pprint import pprint
>>> pprint(qs.query.alias_map)
{u'bar_record': JoinInfo(table_name=u'bar_record', rhs_alias=u'bar_record', join_type='INNER JOIN', lhs_alias=u'bar_topic', lhs_join_col=u'id', rhs_join_col='topic_id', nullable=True),
u'bar_topic': JoinInfo(table_name=u'bar_topic', rhs_alias=u'bar_topic', join_type=None, lhs_alias=None, lhs_join_col=None, rhs_join_col=None, nullable=False),
u'auth_user': JoinInfo(table_name=u'auth_user', rhs_alias=u'auth_user', join_type='INNER JOIN', lhs_alias=u'bar_record', lhs_join_col='user_id', rhs_join_col=u'id', nullable=False)}
Note that Django only supports two possible join_type
s, INNER JOIN
and LEFT OUTER JOIN
(source).
Now, we can use the Query
object’s promote_joins
methods to use a LEFT OUTER JOIN
on the bar_record
table…
>>> qs.query.promote_joins(['bar_record'])
>>> pprint(qs.query.alias_map)
{u'bar_record': JoinInfo(table_name=u'bar_record', rhs_alias=u'bar_record', join_type='LEFT OUTER JOIN', lhs_alias=u'bar_topic', lhs_join_col=u'id', rhs_join_col='topic_id', nullable=True),
u'bar_topic': JoinInfo(table_name=u'bar_topic', rhs_alias=u'bar_topic', join_type=None, lhs_alias=None, lhs_join_col=None, rhs_join_col=None, nullable=False),
u'auth_user': JoinInfo(table_name=u'auth_user', rhs_alias=u'auth_user', join_type='LEFT OUTER JOIN', lhs_alias=u'bar_record', lhs_join_col='user_id', rhs_join_col=u'id', nullable=False)}
…which will change the query to…
>>> print qs.query
SELECT "bar_topic"."id", "bar_topic"."name" FROM "bar_topic" LEFT OUTER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1
…however, this is still no use, since the join will always match a row, even if it doesn’t belong to the correct user, and the WHERE
clause will filter it out.
Using values_list()
automatically influences the join_type
…
>>> qs = Topic.objects.filter(record__user_id=1).values_list('name', 'record__value')
>>> print qs.query
SELECT "bar_topic"."name", "bar_record"."value" FROM "bar_topic" LEFT OUTER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1
…but ultimately suffers from the same problem.
There is, unfortunately, a fundamental limitation in the joins generated by the ORM, in that they can only be of the form…
(LEFT OUTER|INNER) JOIN <lhs_alias> ON (<lhs_alias>.<lhs_join_col> = <rhs_alias>.<rhs_join_col>)
…so there’s really no way to achieve your desired SQL, other than using a raw query.
Sure, you can hack around with things like annotate()
and extra()
, but they’ll likely generate queries which are far less performant, and arguably no more readable than raw SQL.
…and a suggested alternative.
Personally, I’d just use the raw query…
select * from bar_topic
left join bar_record
on bar_record.topic_id = bar_topic.id and bar_record.user_id = 1
…which is simple enough to be compatible with all of Django’s supported backends.
Django 2.0 introduced FilteredRelation
objects, which I believe are what you want. This
print('nnew ormn---')
with self.assertNumQueries(1):
topics = Topic.objects.annotate(
filtered_record=FilteredRelation(
'record', condition=Q(record__user_id=1)
)
).values_list('name', 'filtered_record__value')
for topic in topics:
print(*topic)
produces the expected table:
new orm
---
A 1
B None
C 3
The query Django outputs:
SELECT "bar_topic"."name", filtered_record."value"
FROM "bar_topic" LEFT OUTER JOIN "bar_record" filtered_record
ON ("bar_topic"."id" = filtered_record."topic_id"
AND (filtered_record."user_id" = 1))
Leaving this out here since it solved our issue and it may help someone else.
- We had two tables, let’s call them Client and Contract.
- Client refers to Contract via a foreign key (many to one)
- For some Client-records the Contract-record was deleted
- We wanted to find the Clients which referred to Contract which no longer existed
We started out using OuterRef
but both Client and Contract are very large tables. OuterRef
is translated to a WHERE EXISTS
, which was painfully slow. A lot of the above options also didn’t work for us. Here is what we did.
from django.db.models.sql.constants import LOUTER
queryset = Client.objects.filter(contract__date=None)
queryset.query.alias_map['contract'].join_type = LOUTER
Running print(str(queryset.query))
does indeed yield a query with a left outer join and the results also matches expectations.
Closing observations:
contract__id=None
wouldn’t work because Django is too clever for its own good and just checks contract_id=None
on the child table, which isn’t what you’re looking for with orphaned records. We picked a NOT NULL
field in the parent table (contract__date
in our example), forcing Django to make the join.
- Inspect the
alias_map
to see how Django has named your aliases.
I have a website where users can see a list of movies, and create reviews for them.
The user should be able to see the list of all the movies. Additionally, IF they have reviewed the movie, they should be able to see the score that they gave it. If not, the movie is just displayed without the score.
They do not care at all about the scores provided by other users.
Consider the following models.py
from django.contrib.auth.models import User
from django.db import models
class Topic(models.Model):
name = models.TextField()
def __str__(self):
return self.name
class Record(models.Model):
user = models.ForeignKey(User)
topic = models.ForeignKey(Topic)
value = models.TextField()
class Meta:
unique_together = ("user", "topic")
What I essentially want is this
select * from bar_topic
left join (select topic_id as tid, value from bar_record where user_id = 1)
on tid = bar_topic.id
Consider the following test.py
for context:
from django.test import TestCase
from bar.models import *
from django.db.models import Q
class TestSuite(TestCase):
def setUp(self):
t1 = Topic.objects.create(name="A")
t2 = Topic.objects.create(name="B")
t3 = Topic.objects.create(name="C")
# 2 for Johnny
johnny = User.objects.create(username="Johnny")
johnny.record_set.create(topic=t1, value=1)
johnny.record_set.create(topic=t3, value=3)
# 3 for Mary
mary = User.objects.create(username="Mary")
mary.record_set.create(topic=t1, value=4)
mary.record_set.create(topic=t2, value=5)
mary.record_set.create(topic=t3, value=6)
def test_raw(self):
print('nrawn---')
with self.assertNumQueries(1):
topics = Topic.objects.raw('''
select * from bar_topic
left join (select topic_id as tid, value from bar_record where user_id = 1)
on tid = bar_topic.id
''')
for topic in topics:
print(topic, topic.value)
def test_orm(self):
print('normn---')
with self.assertNumQueries(1):
topics = Topic.objects.filter(Q(record__user_id=1)).values_list('name', 'record__value')
for topic in topics:
print(*topic)
BOTH tests should print the exact same output, however, only the raw version spits out the correct table of results:
raw --- A 1 B None C 3
the orm instead returns this
orm --- A 1 C 3
Any attempt to join back the rest of the topics, those that have no reviews from user “johnny”, result in the following:
orm
---
A 1
A 4
B 5
C 3
C 6
How can I accomplish the simple behavior of the raw query with the Django ORM?
edit: This sort of works but seems very poor:
topics = Topic.objects.filter(record__user_id=1).values_list('name', 'record__value') noned = Topic.objects.exclude(record__user_id=1).values_list('name') for topic in chain(topics, noned): ...
edit: This works a little bit better, but still bad:
topics = Topic.objects.filter(record__user_id=1).annotate(value=F('record__value')) topics |= Topic.objects.exclude(pk__in=topics)
orm --- A 1 B 5 C 3
First of all, there is no a way (atm Django 1.9.7) to have a representation with Django’s ORM of the raw query you posted, exactly as you want; however, you can get the same desired result with something like:
>>> Topic.objects.annotate(
f=Case(
When(
record__user=johnny,
then=F('record__value')
),
output_field=IntegerField()
)
).order_by(
'id', 'name', 'f'
).distinct(
'id', 'name'
).values_list(
'name', 'f'
)
>>> [(u'A', 1), (u'B', None), (u'C', 3)]
>>> Topic.objects.annotate(f=Case(When(record__user=may, then=F('record__value')), output_field=IntegerField())).order_by('id', 'name', 'f').distinct('id', 'name').values_list('name', 'f')
>>> [(u'A', 4), (u'B', 5), (u'C', 6)]
Here the SQL generated for the first query:
>>> print Topic.objects.annotate(f=Case(When(record__user=johnny, then=F('record__value')), output_field=IntegerField())).order_by('id', 'name', 'f').distinct('id', 'name').values_list('name', 'f').query
>>> SELECT DISTINCT ON ("payments_topic"."id", "payments_topic"."name") "payments_topic"."name", CASE WHEN "payments_record"."user_id" = 1 THEN "payments_record"."value" ELSE NULL END AS "f" FROM "payments_topic" LEFT OUTER JOIN "payments_record" ON ("payments_topic"."id" = "payments_record"."topic_id") ORDER BY "payments_topic"."id" ASC, "payments_topic"."name" ASC, "f" ASC
##Some notes
- Doesn’t hesitate to use raw queries, specially when the performance is the most important thing. Moreover, sometimes it is a must since you can’t get the same result using Django’s ORM; in other cases you can, but once in a while having clean and understandable code is more important than the performance in this piece of code.
distinct
with positional arguments is used in this answer, which is available for PostgreSQL only, atm. In the docs you can see more about conditional expressions.
Raw Queries.
topics = Topic.objects.raw('''
select * from bar_topic
left join (select topic_id as tid, value from bar_record where user_id = 1) AS subq
on tid = bar_topic.id
''')
You seem to know the answer yourself. There is nothing wrong with using a raw query when you can’t get the ORM query to behave exactly the way you want it to.
One main drawback with raw queries is that they are not cached like ORM queries. That means if you iterate through the raw queryset twice, the query will be repeated. Another is that you cannot call .count() on it.
Null Foreign Keys
You can force the ORM to use LEFT OUTER JOIN BY setting null=True
in the foreign keys. Do this with the tables as they are.
print Record.objects.filter(user_id=8).select_related('topic').query
The result is
SELECT "bar_record"."id", "bar_record"."user_id", "bar_record"."topic_id", "bar_record"."value", "bar_topic"."id", "bar_topic"."name" FROM "bar_record"
INNER JOIN "bar_topic" ON ( "bar_record"."topic_id" = "bar_topic"."id" ) WHERE "bar_record"."user_id" = 8
Now set, null=True and perform the same ORM query as above. The result is
SELECT "bar_record"."id", "bar_record"."user_id", "bar_record"."topic_id", "bar_record"."value", "bar_topic"."id", "bar_topic"."name" FROM "bar_record"
LEFT OUTER JOIN "bar_topic" ON ( "bar_record"."topic_id" = "bar_topic"."id" ) WHERE "bar_record"."user_id" = 8
Note how the query has suddenly changed to LEFT OUTER JOIN
. But we are not out of the woods yet because the order of tables should be reversed! Thus unless you can restructure your models, an ORM LEFT OUTER JOIN may not be entirely possible without chaining or UNION both of which you have tried out already.
This is how I would do it. Two queries, not one:
class Topic(models.Model):
#...
@property
def user_value(self):
try:
return self.user_records[0].value
except IndexError:
#This topic does not have
#a review by the request.user
return None
except AttributeError:
raise AttributeError('You forgot to prefetch the user_records')
#or you can just
return None
#usage
topics = Topic.objects.all().prefetch_related(
models.Prefetch('record_set',
queryset=Record.objects.filter(user=request.user),
to_attr='user_records'
)
)
for topic in topics:
print topic.user_value
The benefit is that you get the whole Record
object. So consider a situation where you want to not only show the value
, but the time-stamp
also.
Just for the record I want to show one more solution using .extra
. I’m impressed no one has mention it, because it should produce the best possible performance.
topics = Topic.objects.all().extra(
select={
'user_value': """SELECT value FROM myapp_record
WHERE myapp_record.user_id = %s
AND myapp_record.topic_id = myapp_topic.id
"""
},
select_params=(request.user.id,)
)
for topic in topics
print topic.user_value
Both solutions can be abstracted into a custom TopicQuerySet
class for re-usability.
class TopicQuerySet(models.QuerySet):
def prefetch_user_records(self, user):
return self.prefetch_related(
models.Prefetch('record_set',
queryset=Record.objects.filter(user=request.user),
to_attr='user_records'
)
)
def annotate_user_value(self, user):
return self.extra(
select={
'user_value': """SELECT value FROM myapp_record
WHERE myapp_record.user_id = %s
AND myapp_record.topic_id = myapp_topic.id
"""
},
select_params=(user.id,)
)
class Topic(models.Model):
#...
objects = TopicQuerySet.as_manager()
#usage
topics = Topic.objects.all().annotate_user_value(request.user)
#or
topics = Topic.objects.all().prefetch_user_records(request.user)
for topic in topics:
print topic.user_value
This more universal solution inspired by trinchet’s answer works also with other databases:
>>> qs = Topic.objects.annotate(
... f=Max(Case(When(record__user=johnny, then=F('record__value'))))
... )
example data
>>> print(qs.values_list('name', 'f'))
[(u'A', 1), (u'B', None), (u'C', 3)]
verify the query
>>> print(qs.query) # formated and removed excessive double quotes
SELECT bar_topic.id, bar_topic.name,
MAX(CASE WHEN bar_record.user_id = 1 THEN bar_record.value ELSE NULL END) AS f
FROM bar_topic LEFT OUTER JOIN bar_record ON (bar_topic.id = bar_record.topic_id)
GROUP BY bar_topic.id, bar_topic.name
Advantages (compared to the original solutions)
- It works also with SQLite.
- The query set can be easily filtered or sorted, no matter how.
- No type cast
output_field
is necessary. - The methods
values
orvalues_list(*field_names)
are useful for a more simpleGROUP BY
, but they are not necessary.
The left join can be made more readable by writing a function:
from django.db.models import Max, Case, When, F
def left_join(result_field, **lookups):
return Max(Case(When(then=F(result_field), **lookups)))
>>> Topic.objects.annotate(
... record_value=left_join('record__value', record__user=johnny),
... ).values_list('name', 'record_value')
More fields from Record can be added by anotate
method to results this way with nice mnemonic names.
I agree with other authors that it can be optimized, but readability counts.
EDIT: The same result comes if the aggregation function Max
is replaced by Min
. Both Min and Max ignore NULL values and can be used on any type, e.g. for strings. The aggregation is useful if the left join is not guaranteed to be unique. If the field is numeric than it can be useful to use average value Avg
on the left join.
What I essentially want is this
select * from bar_topic left join (select topic_id as tid, value from bar_record where user_id = 1) on tid = bar_topic.id
…or, perhaps this equivalent which avoids a subquery…
select * from bar_topic
left join bar_record
on bar_record.topic_id = bar_topic.id and bar_record.user_id = 1
I want to know how to do this effectively, or, if it is impossible, an explanation of why it is impossible…
Unless you use raw queries, it’s impossible with Django’s ORM, and here’s why.
QuerySet
objects (django.db.models.query.QuerySet
) have a query
attribute (django.db.models.sql.query.Query
) which is a representation of the actual query which will be performed. These Query
objects helpfully have a __str__
method, so you can print it out to see what it is.
Let’s start with a simple QuerySet
…
>>> from bar.models import *
>>> qs = Topic.objects.filter(record__user_id=1)
>>> print qs.query
SELECT "bar_topic"."id", "bar_topic"."name" FROM "bar_topic" INNER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1
…which is obviously not going to work, due to the INNER JOIN
.
Taking a deeper look inside the Query
object, there’s an alias_map
attribute which determines what table joins will be performed…
>>> from pprint import pprint
>>> pprint(qs.query.alias_map)
{u'bar_record': JoinInfo(table_name=u'bar_record', rhs_alias=u'bar_record', join_type='INNER JOIN', lhs_alias=u'bar_topic', lhs_join_col=u'id', rhs_join_col='topic_id', nullable=True),
u'bar_topic': JoinInfo(table_name=u'bar_topic', rhs_alias=u'bar_topic', join_type=None, lhs_alias=None, lhs_join_col=None, rhs_join_col=None, nullable=False),
u'auth_user': JoinInfo(table_name=u'auth_user', rhs_alias=u'auth_user', join_type='INNER JOIN', lhs_alias=u'bar_record', lhs_join_col='user_id', rhs_join_col=u'id', nullable=False)}
Note that Django only supports two possible join_type
s, INNER JOIN
and LEFT OUTER JOIN
(source).
Now, we can use the Query
object’s promote_joins
methods to use a LEFT OUTER JOIN
on the bar_record
table…
>>> qs.query.promote_joins(['bar_record'])
>>> pprint(qs.query.alias_map)
{u'bar_record': JoinInfo(table_name=u'bar_record', rhs_alias=u'bar_record', join_type='LEFT OUTER JOIN', lhs_alias=u'bar_topic', lhs_join_col=u'id', rhs_join_col='topic_id', nullable=True),
u'bar_topic': JoinInfo(table_name=u'bar_topic', rhs_alias=u'bar_topic', join_type=None, lhs_alias=None, lhs_join_col=None, rhs_join_col=None, nullable=False),
u'auth_user': JoinInfo(table_name=u'auth_user', rhs_alias=u'auth_user', join_type='LEFT OUTER JOIN', lhs_alias=u'bar_record', lhs_join_col='user_id', rhs_join_col=u'id', nullable=False)}
…which will change the query to…
>>> print qs.query
SELECT "bar_topic"."id", "bar_topic"."name" FROM "bar_topic" LEFT OUTER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1
…however, this is still no use, since the join will always match a row, even if it doesn’t belong to the correct user, and the WHERE
clause will filter it out.
Using values_list()
automatically influences the join_type
…
>>> qs = Topic.objects.filter(record__user_id=1).values_list('name', 'record__value')
>>> print qs.query
SELECT "bar_topic"."name", "bar_record"."value" FROM "bar_topic" LEFT OUTER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1
…but ultimately suffers from the same problem.
There is, unfortunately, a fundamental limitation in the joins generated by the ORM, in that they can only be of the form…
(LEFT OUTER|INNER) JOIN <lhs_alias> ON (<lhs_alias>.<lhs_join_col> = <rhs_alias>.<rhs_join_col>)
…so there’s really no way to achieve your desired SQL, other than using a raw query.
Sure, you can hack around with things like annotate()
and extra()
, but they’ll likely generate queries which are far less performant, and arguably no more readable than raw SQL.
…and a suggested alternative.
Personally, I’d just use the raw query…
select * from bar_topic
left join bar_record
on bar_record.topic_id = bar_topic.id and bar_record.user_id = 1
…which is simple enough to be compatible with all of Django’s supported backends.
Django 2.0 introduced FilteredRelation
objects, which I believe are what you want. This
print('nnew ormn---')
with self.assertNumQueries(1):
topics = Topic.objects.annotate(
filtered_record=FilteredRelation(
'record', condition=Q(record__user_id=1)
)
).values_list('name', 'filtered_record__value')
for topic in topics:
print(*topic)
produces the expected table:
new orm
---
A 1
B None
C 3
The query Django outputs:
SELECT "bar_topic"."name", filtered_record."value"
FROM "bar_topic" LEFT OUTER JOIN "bar_record" filtered_record
ON ("bar_topic"."id" = filtered_record."topic_id"
AND (filtered_record."user_id" = 1))
Leaving this out here since it solved our issue and it may help someone else.
- We had two tables, let’s call them Client and Contract.
- Client refers to Contract via a foreign key (many to one)
- For some Client-records the Contract-record was deleted
- We wanted to find the Clients which referred to Contract which no longer existed
We started out using OuterRef
but both Client and Contract are very large tables. OuterRef
is translated to a WHERE EXISTS
, which was painfully slow. A lot of the above options also didn’t work for us. Here is what we did.
from django.db.models.sql.constants import LOUTER
queryset = Client.objects.filter(contract__date=None)
queryset.query.alias_map['contract'].join_type = LOUTER
Running print(str(queryset.query))
does indeed yield a query with a left outer join and the results also matches expectations.
Closing observations:
contract__id=None
wouldn’t work because Django is too clever for its own good and just checkscontract_id=None
on the child table, which isn’t what you’re looking for with orphaned records. We picked aNOT NULL
field in the parent table (contract__date
in our example), forcing Django to make the join.- Inspect the
alias_map
to see how Django has named your aliases.