Django ORM multiple chained JOIN equivalent and aggregation

Question:

Given the following Django models (lots shown just as an example, could be more or less nested):

class ModelA(models.Model):
    value = models.IntegerField()

class ModelB(models.Model):
    modelA = models.ForeignKey(ModelA, on_delete=models.CASCADE)
    value = models.IntegerField()

class ModelC(models.Model):
    modelB = models.ForeignKey(ModelB, on_delete=models.CASCADE)
    value = models.IntegerField()

class ModelD(models.Model):
    modelC = models.ForeignKey(ModelC, on_delete=models.CASCADE)
    value = models.IntegerField()

class ModelE(models.Model):
    modelD = models.ForeignKey(ModelD, on_delete=models.CASCADE)
    value = models.IntegerField()

# etc...

How can we use the Django ORM to do the following operations:

e.g. all ModelE for a given modelA, SQL equivalent:

SELECT ModelE.*
FROM ModelA
JOIN ModelB ON ModelB.modelA = ModelA.id
JOIN ModelC ON ModelC.modelB = ModelB.id
JOIN ModelD ON ModelD.modelC = ModelC.id
JOIN ModelE ON ModelE.modelD = ModelD.id
WHERE ModelA.id = 1

e.g. group all records by some model, SQL equivalent:

SELECT ModelC.*, SUM(ModelE.value)
FROM ModelA
JOIN ModelB ON ModelB.modelA = ModelA.id
JOIN ModelC ON ModelC.modelB = ModelB.id
JOIN ModelD ON ModelD.modelC = ModelC.id
JOIN ModelE ON ModelE.modelD = ModelD.id
WHERE ModelA.id = 1
GROUP BY ModelC.id

The specific query I’m trying to get is equivalent to the following:

SELECT ModelC.value * SUM(ModelE.value)
FROM ModelA
JOIN ModelB ON ModelB.modelA = ModelA.id
JOIN ModelC ON ModelC.modelB = ModelB.id
JOIN ModelD ON ModelD.modelC = ModelC.id
WHERE ModelA.id = 1 AND ModelD.value >= 1 AND ModelD.value < 3
GROUP BY ModelC.id

I’m having to use a Python workaround which is quite inefficient but much more understandable. I was hoping there was a way using the Django ORM to do this.

Asked By: IbbyStack

||

Answers:

Not sure if it matches what you want.

However, you may be able to use this ORM code by modifying it.

from django.db.models import F, Sum

queryset = (
    ModelC.objects
          .annotate(
               sum_e_values=Sum('modeld__modele__value'),
               result_value=F('value') * F('sum_e_values'),
           )
          .filter(
               modelB__modelA_id=1,
               modeld__value__gte=1,
               modeld__value__lt=3,
          )
          .values('result_value')
)
print(queryset.query)

Output:

SELECT ("myapp_modelc"."value" * SUM("myapp_modele"."value")) AS "result_value"
FROM "myapp_modelc"
LEFT OUTER JOIN "myapp_modeld" ON ("myapp_modelc"."id" = "myapp_modeld"."modelC_id")
LEFT OUTER JOIN "myapp_modele" ON ("myapp_modeld"."id" = "myapp_modele"."modelD_id")
INNER JOIN "myapp_modelb" ON ("myapp_modelc"."modelB_id" = "myapp_modelb"."id")
INNER JOIN "myapp_modeld" T6 ON ("myapp_modelc"."id" = T6."modelC_id")
WHERE ("myapp_modelb"."modelA_id" = 1
       AND T6."value" < 3
       AND T6."value" >= 1)
GROUP BY "myapp_modelc"."id",
         "myapp_modelc"."modelB_id",
         "myapp_modelc"."value"
Answered By: gypark

The answer was surprisingly simple but nowhere is it mentioned explicitly. Thank you to @gypark for giving the correct approach as well!

The approach does not match the given SQL exactly, but gives the same sort of results.

For the first problem, the following ORM query works:

# Selects all ModelE for a given ModelA
ModelE.objects.filter(modelD__modelC__modelB__modelA_id = 1)

This generates the following equivalent SQL (changed for clarity):

SELECT modele.id, modele.modelD_id, modele.name, modele.value
FROM modele
INNER JOIN modeld ON (modele.modelD_id = modeld.id)
INNER JOIN modelc ON (modeld.modelC_id = modelc.id)
INNER JOIN modelb ON (modelc.modelB_id = modelb.id)
INNER JOIN modela ON (modelb.modelA_id = modela.id)
WHERE modela.id = 1

For the second:

ModelC.objects.filter(modelB__modelA_id=1).annotate(sumE=Coalesce(Sum('modeld__modele__value'), 0))

Which generates the following SQL equivalent:

SELECT modelc.id, modelc.modelB_id, modelc.name, modelc.value, COALESCE(SUM(modele.value), 0) AS sumE
FROM modelc
INNER JOIN modelb ON (modelc.modelB_id = modelb.id)
LEFT OUTER JOIN modeld ON (modelc.id = modeld.modelC_id)
LEFT OUTER JOIN modele ON (modeld.id = modele.modelD_id)
WHERE modelb.modelA_id = 1
GROUP BY modelc.id, modelc.modelB_id, modelc.name, modelc.value

For the third:

ModelA.objects.filter(
    id=1,
    modelb__modelc__modeld__value__gte=1,
    modelb__modelc__modeld__value__lt=3,
).aggregate(
    sum=Coalesce(Sum(F('modelb__modelc__value') * F('modelb__modelc__modeld__modele__value')), 0)
)

Which gives the following SQL:

SELECT COALESCE(SUM((modelc.value * modele.value)), 0) AS sum
FROM modela
INNER JOIN modelb ON (modela.id = modelb.modelA_id)
INNER JOIN modelc ON (modelb.id = modelc.modelB_id)
INNER JOIN modeld ON (modelc.id = modeld.modelC_id)
LEFT OUTER JOIN modele ON (modeld.id = modele.modelD_id)
WHERE (modela.id = 1 AND modeld.value >= 1 AND modeld.value < 3)

Just a note, the Coalesce (from django.db.models.functions import Coalesce) is required to fix the issue of None being returned when the QuerySet is empty.

Answered By: IbbyStack