Understanding how to create a heap in Python

Question:

The collections.Count.most_common function in Python uses the heapq module to return the count of the most common word in a file, for instance.

I have traced through the heapq.py file, but I’m having a bit of trouble understanding how a heap is created/updated with respect to words let’s say.

So, I think the best way for me to understand it, is to figure out how to create a heap from scratch.

Can someone provide a pseudocode for creating a heap that would represent word count?

Asked By: Sam Hammamy

||

Answers:

this is a slightly modified version of the code found here : http://code.activestate.com/recipes/577086-heap-sort/

def HeapSort(A,T):
    def heapify(A):
        start = (len(A) - 2) / 2
        while start >= 0:
            siftDown(A, start, len(A) - 1)
            start -= 1

    def siftDown(A, start, end):
        root = start
        while root * 2 + 1 <= end:
            child = root * 2 + 1
            if child + 1 <= end and T.count(A[child]) < T.count(A[child + 1]):
                child += 1
            if child <= end and T.count(A[root]) < T.count(A[child]):
                A[root], A[child] = A[child], A[root]
                root = child
            else:
                return

    heapify(A)
    end = len(A) - 1
    while end > 0:
        A[end], A[0] = A[0], A[end]
        siftDown(A, 0, end - 1)
        end -= 1


if __name__ == '__main__':
    text = "the quick brown fox jumped over the the quick brown quick log log"
    heap = list(set(text.split()))
    print heap

    HeapSort(heap,text)
    print heap

Output

['brown', 'log', 'jumped', 'over', 'fox', 'quick', 'the']
['jumped', 'fox', 'over', 'brown', 'log', 'the', 'quick']

you can visualize the program here
http://goo.gl/2a9Bh

Answered By: Joran Beasley

In Python 2.X and 3.x, heaps are supported through an importable library, heapq. It supplies numerous functions to work with the heap data structure modelled in a Python list.
Example:

>>> from heapq import heappush, heappop
>>> heap = []
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> for item in data:
        heappush(heap, item)

>>> ordered = []
>>> while heap:
        ordered.append(heappop(heap))

>>> ordered
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> data.sort()
>>> data == ordered
True

You can find out more about Heap functions: heappush, heappop, heappushpop, heapify, heapreplace in heap python docs.

Answered By: Hueston Rido

Here’s another variation based on Sedgewick

The heap is represented internally in an array where if a node is at k, it’s children are at 2*k and 2*k + 1. The first element of the array is not used, to make the math more convenient.

To add a new element to the heap, you append it to the end of the array and then call swim repeatedly until the new element finds its place in the heap.

To delete the root, you swap it with the last element in the array, delete it and then call sink until the swapped element finds its place.

swim(k):
  while k > 1 and less(k/2, k):
    exch(k, k/2)
    k = k/2

sink(k):
  while 2*k <= N:
    j = 2*k
    if j < N and less(j, j+1):
      j++
    if not less(k, j):
      break
    exch(k, j)
    k = j

Here’s a visualization of heap insert, inserting the first 15 letters of the alphabet: [a-o]

heap insert visualization

Answered By: slashdottir

Your confusion may come from the fact that the Python module heapq does not define a heap as a data type (a class) with its own methods (e.g. as in a deque or a list). It instead provides functions that you can run on a Python list.

It’s best to think of heapq as a module providing a set of algorithms (methods) to interpret lists as heaps and manipulate them accordingly. Note that it’s common to represent heaps internally as arrays (as an abstract data structure), and Python already has lists serving that purpose, so it makes sense for heapq to just provide methods to manipulate lists as heaps.

Let’s see this with an example. Starting with a simple Python list:

>>> my_list = [2, -1, 4, 10, 0, -20]

To create a heap with heapq from my_list we just need to call heapify which simply re-arranges the elements of the list to form a min-heap:

>>> import heapq
>>> # NOTE: This returns NoneType:
>>> heapq.heapify(my_list)

Note that you can still access the list underlying the heap, since all heapify has done is change the value referenced by my_list:

>>> my_list
[-20, -1, 2, 10, 0, 4]

Popping elements from the heap held by my_list:

>>> [heapq.heappop(my_list) for x in range(len(my_list))]
[-20, -1, 0, 2, 4, 10]
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.