How can I speed up Django Query aggregations on a 10 million row database with 60 features?

Question:

I have a database table in psql which contains of 10,000,000 rows and 60 columns (features). I define a Django Queryset as follows:

MyQ=MyDataBase.objects.filter(Name='Mike', date=date(2018, 2, 11), 
Class='03')

There are only 5 rows that satisfy the above filter. But when I try something like

MyQ.count() #which equals 5

or

MyQ.aggregate(Sum('Score'))['Score__sum'] #which equals 61

each take about 3 minutes to give me the result. Isn’t that weird? Aren’t query sets supposed to make life easier by focusing only on the rows that we have told them to focus on? counting 5 rows or summing one of the fields of them must not take that long. What am I doing wrong?

I should also say this. The first time that I tried this code on this table, everything was fine and it took maybe 1 second to catch the result but now the 3 minutes is really annoying. And since then I have not changed anything in the database or the code.

Asked By: Mike

||

Answers:

Generally if you are filtering your table based on a certain field or number of fields, you should create an index on those fields. It allows the database query planner to take a more optimized path when searching/sorting.

It looks like you’re using Postgres from your question, so you can run SELECT * FROM pg_indexes WHERE tablename = 'yourtable'; in psql to see any existing indexes.

Django can create these indexes for you in your model definition. For example, your model MyDatabase might look something like this:

class MyDatabase(models.Model):
    name = models.TextField(index=True)
    date = models.DateField(index=True)
   class = models.TextField(index=True)

Here’s some more reading specific to creating indexes on Django models: gun.io/blog/learn-indexing-dammit

Answered By: Ben Siver

There should be well index database table which is filtering.

If you aggerate multiple time with group_by then use CTE.
The django-cte packge supporting it.

If you have historical data and performing multiple time aggregation then use
metalized view in postgres or respective db alternative.
You can refresh the data periodically in by scheduling the job.
in the mat view you can also create index.

Answered By: Chandan Kumar Singh