Why is deque implemented as a linked list instead of a circular array?

Question:

CPython deque is implemented as a doubly-linked list of 64-item sized “blocks” (arrays). The blocks are all full, except for the ones at either end of the linked list. IIUC, the blocks are freed when a pop / popleft removes the last item in the block; they are allocated when append/appendleft attempts to add a new item and the relevant block is full.

I understand the listed advantages of using a linked list of blocks rather than a linked list of items:

  • reduce memory cost of pointers to prev and next in every item
  • reduce runtime cost of doing malloc/free for every item added/removed
  • improve cache locality by placing consecutive pointers next to each other

But why wasn’t a single dynamically-sized circular array used instead of the doubly-linked list in the first place?

AFAICT, the circular array would preserve all the above advantages, and maintain the (amortized) cost of pop*/append* at O(1) (by overallocating, just like in list). In addition, it would improve the cost of lookup by index from the current O(n) to O(1). A circular array would also be simpler to implement, since it can reuse much of the list implementation.

I can see an argument in favor of a linked list in languages like C++, where removal of an item from the middle can be done in O(1) using a pointer or iterator; however, python deque has no API to do this.

Asked By: max

||

Answers:

Adapted from my reply on the python-dev mailing list:

The primary point of a deque is to make popping and pushing at both ends efficient. That’s what the current implementation does: worst-case constant time per push or pop regardless of how many items are in the deque. That beats “amortized O(1)” in the small and in the large. That’s why it was done this way.

Some other deque methods are consequently slower than they are for lists, but who cares? For example, the only indices I’ve ever used with a deque are 0 and -1 (to peek at one end or the other of a deque), and the implementation makes accessing those specific indices constant-time too.

Indeed, the message from Raymond Hettinger referenced by Jim Fasarakis Hilliard in his comment:

https://www.mail-archive.com/[email protected]/msg25024.html

confirms that

The only reason that __getitem__ was put in was to support fast access to the head and tail without actually popping the value

Answered By: Tim Peters

In addition to accepting @TimPeters answer, I wanted to add a couple additional observations that don’t fit into a comment format.

Let’s just focus on a common use case where deque is used as a simple a FIFO queue.

Once the queue reaches its peak size, the circular array need no more allocations of memory from the heap. I thought of it as an advantage, but it turns out the CPython implementation achieved the same by keeping a list of reusable memory blocks. A tie.

While the queue size is growing, both the circular array and the CPython need memory from the heap. CPython needs a simple malloc, while the array needs the (potentially much more expensive) realloc (unless extra space happens to be available on the right edge of the original memory block, it needs to free the old memory and copy the data over). Advantage to CPython.

If the queue peaked out at a much larger size than its stable size, both CPython and the array implementation would waste the unused memory (the former by saving it in a reusable block list, the latter by leaving the unused empty space in the array). A tie.

As @TimPeters pointed out, the cost of each FIFO queue put / get is always O(1) for CPython, but only amortized O(1) for the array. Advantage to CPython.

Answered By: max