How efficient is it to order by distance in geodjango (entire table)

Question:

Assume that I have the following data model:

Person(models.Model):
    id       = models.BigAutoField(primary_key=True)
    name     = models.CharField(max_length=50)
    location = models.PointField(srid=4326)

Assume also that I have an app that queries a Django backend, the only function of the app is to return a list of registered users sorted from closest to farthest in a paginated format.

Currently I have the following query in mind:

# here we are obtaining all users in ordered form
current_location = me.location
people = Person.objects.distance(current_location).order_by('distance')

# here we are obtaining the first X through pagination
start_index = a
end_index = b

people = people[a:b]

Although this works, it is not as fast as I would like.

I have worries about the speed of this query. If the table was extensive (over 1 million), wouldn’t the Postgres SQL database with PostGIS have to calculate the distance between the current_location and every location in the database prior to sorting the subsequent 1 million rows through an order_by operation?

Can anyone suggest a more efficient alternative method for retrieving and sorting nearby users based on distance?

Asked By: AlanSTACK

||

Answers:

If you want to sort every entry on that table by distance then it will be slow as expected and there is nothing that can be done (that I am aware of at this point of time and my knowledge.)!

You can make your calculation more efficient by following this steps and making some assumptions:

  1. Enable spatial indexing on your tables. To do that in GeoDjango, follow the doc instructions and fit them to your model:

    Note

    In PostGIS, ST_Distance_Sphere does not limit the geometry types geographic distance queries are performed with. [4] However, these queries may take a long time, as great-circle distances must be calculated on the fly for every row in the query. This is because the spatial index on traditional geometry fields cannot be used.

    For much better performance on WGS84 distance queries, consider using geography columns in your database instead because they are able to use their spatial index in distance queries. You can tell GeoDjango to use a geography column by setting geography=True in your field definition.

  2. Now you can narrow down your query with some logical constrains:

    Ex: My user will not look for people more than 50km from his current position.

  3. Narrow down the search using dwithin spatial lookup which utilizes the above mentioned spatial indexing, therefore it is pretty fast.

  4. Finally apply the distance order by on the remaining rows.

The final query can look like this:

current_location = me.location
people = People.objects.filter(
    location__dwithin=(current_location, D(km=50))
).annotate(
    distance=Distance('location', current_location)
).order_by('distance')

P.S: Rather than creating a custom pagination attempt, it is more efficient to utilize the pagination methods provided for the django views:

Or you can use Django Rest Framework and use it’s pagination:

Answered By: John Moutafis