Django Rest Framework pagination extremely slow count

Question:

I turned pagination on in Django Rest framework and it appears to be incredibly slow. Count looks like the culprit, and is taking hundreds of milliseconds to return each time due to the millions of rows in the tables.

I am using postgresql as the database. Is there any way to not count the rows and still use pagination? The performance was fine before this was enabled if I manually filtered the queryset.

Asked By: Smills

||

Answers:

Override the get_paginated_response method of your pagination class, and do not include the count. You can refer to the base implementation of the PageNumberPagination class to see what you should return.

from rest_framework.pagination import PageNumberPagination
from collections import OrderedDict # requires Python 2.7 or later

class PageNumberPaginationWithoutCount(PageNumberPagination):
    # Set any other options you want here like page_size

    def get_paginated_response(self, data):
        return Response(OrderedDict([
            ('next', self.get_next_link()),
            ('previous', self.get_previous_link()),
            ('results', data)
        ]))

Then in your settings.py, set DEFAULT_PAGINATION_CLASS to your new pagination class.

DEFAULT_PAGINATION_CLASS = 'path.to.PageNumberPaginationWithoutCount'

This approach is used in the example in the pagination docs.

Edit: from the comments below it sounds like this might not be enough to prevent the slow sql query, so you might need to override paginate_queryset as well.

Answered By: Alasdair

If you are ok without count, next and previous links, Following custom class can be used.

import sys
from collections import OrderedDict

from django.core.paginator import Paginator
from django.utils.functional import cached_property
from rest_framework.pagination import PageNumberPagination
from rest_framework.response import Response


class CustomPaginatorClass(Paginator):
    @cached_property
    def count(self):
        return sys.maxsize


# To Avoid large table count query, We can use this paginator class
class LargeTablePagination(PageNumberPagination):
    django_paginator_class = CustomPaginatorClass

    def get_paginated_response(self, data):
        return Response(OrderedDict([
            ('page', self.page.number),
            ('results', data)
        ]))
Answered By: xrage

The issue is, that the query used to count is the same potentially complex one as used to fetch the data. That’s rather wasteful. PageNumberPagination uses Django’s own Paginator internally.

To make the query for the count simpler override the paginator class DRF uses:

from django.core.paginator import Paginator
from django.utils.functional import cached_property
from rest_framework.pagination import PageNumberPagination

class FasterDjangoPaginator(Paginator):
    @cached_property
    def count(self):
        # only select 'id' for counting, much cheaper
        return self.object_list.values('id').count()


class FasterPageNumberPagination(PageNumberPagination):
    django_paginator_class = FasterDjangoPaginator
Answered By: Florian

The other answers either didn’t work for me or were still performing the extra COUNT(*) query.

This will get rid of all pagination, the count query, and will just return the JSON response:

from rest_framework.pagination import PageNumberPagination


class NoCountPagination(PageNumberPagination):
    page_size = None

    def get_paginated_response(self, data):
        return Response({
            'results', data
        })

To use it:

from rest_framework import viewsets
from .models import MyModel
from .serializers import MySerializer


class CustomApiViewSet(viewsets.ReadOnlyModelViewSet):
    """
    Simple viewset for viewing MyModels (as a list, or individually).
    """
    queryset = MyModel.objects.all()
    serializer_class = MySerializer
    pagination_class = NoCountPagination

Beware, this will return all rows from your queryset. In almost all cases, I think it’s probably better to just use PageNumberPagination as-is or maybe use @Florian’s solution above to speed it up a bit.

Answered By: getup8

Adding to getup8’s response, I was able to get it working where it doesn’t return the count but also doesn’t return all of the rows (assuming you have PAGE_SIZE set to a reasonable number in your site’s REST_FRAMEWORK settings).

from rest_framework.pagination import LimitOffsetPagination
    
class NoCountPaginator(LimitOffsetPagination):
    def get_count(self, queryset):
        return 99999999

    def get_paginated_response(self, data):
        return Response(OrderedDict([
            ('results', data)
        ]))

And to use it:

from rest_framework import viewsets
from .models import MyModel
from .serializers import MySerializer


class CustomApiViewSet(viewsets.ReadOnlyModelViewSet):
    queryset = MyModel.objects.all()
    serializer_class = MySerializer
    pagination_class = NoCountPaginator

I had PAGE_SIZE set to 25, so the API always returns only the top 25 rows and it no longer runs the count query.

Answered By: Scott Weaver

This is an example of a paginator that keeps the next/previous links working. It fetches an extra row at the end to check whether there is another page:

class NoCountPagination(LimitOffsetPagination):
    def get_paginated_response(self, data):
        return Response(
            {
                "next": self.get_next_link(),
                "previous": self.get_previous_link(),
                "results": data,
            }
        )

    def paginate_queryset(self, queryset, request, view=None):
        self.offset = self.get_offset(request)
        self.limit = self.get_limit(request)

        # Get one extra element to check if there is a "next" page
        q = list(queryset[self.offset : self.offset + self.limit + 1])
        self.count = self.offset + len(q) if len(q) else self.offset - 1
        if len(q) > self.limit:
            q.pop()

        self.request = request
        if self.count > self.limit and self.template is not None:
            self.display_page_controls = True

        return q

    def get_paginated_response_schema(self, schema):
        ret = super().get_paginated_response_schema(schema)
        del ret["properties"]["count"]
        return ret
Answered By: Paolo Bonzini