Optimising a basic group_by aggregation

Question:

It’s possible that I’m just wildly naive but I would have thought that this aggregation would be quicker considering it’s somewhat simple – no complex joins of any kind and all the data is in a single simple table.

It’s also likely that the answer to this question is one of data size and not one of an efficient query or database set up, but I’m looking for a fast aggregation and sum of the following table:

id	time
1	0
2	0
3	0
2	30
1	22
2	17

The idea is to group by id and sum the time column. There may be anywhere between 300 and 500 names, with an average of 3M rows. In both a mongo and sql the id column is indexed.

Using pymongo is giving my around 3 seconds to perform the query on a static database of 3M entries while SQLAlchemy is giving me around 2 seconds on the same data.

Can I safely assume that it should take that long for 3 million entries, or have I clearly missed something, like a direct SQL query (as opposed to doing a python based sqlalchemy query) might be faster?

Also, note that I would like the results in JSON, which I think is the slow part of sqlalchemy – creating the python object of the result to then send on.

I’m familiar and confident in using SQLAlchemy and pymongo, but not much else so if there’s another database solution that’s quicker i will definitely consider it because I would like to run this query frequently and a 2-4 second lag is a little unpleasant.

Asked By: Fonty

Source

Answers:

It appears as though this processing time is normal and that the only way to speed things up is to use the On-Demand Materialized View recommended by @rickhg12hs to generate some common pre-calculated datasets and if the query required is more complicated than these defaults, then just accept the 2-5 second processing time.

Answered By: Fonty