How to implement Priority Queues in Python?
Question:
Sorry for such a silly question but Python docs are confusing…
Link 1: Queue Implementation
http://docs.python.org/library/queue.html
It says that Queue has a class for the priority queue. But I could not find how to implement it.
class Queue.PriorityQueue(maxsize=0)
Link 2: Heap Implementation
http://docs.python.org/library/heapq.html
Here they say that we can implement priority queues indirectly using heapq
pq = [] # list of entries arranged in a heap
entry_finder = {} # mapping of tasks to entries
REMOVED = '<removed-task>' # placeholder for a removed task
counter = itertools.count() # unique sequence count
def add_task(task, priority=0):
'Add a new task or update the priority of an existing task'
if task in entry_finder:
remove_task(task)
count = next(counter)
entry = [priority, count, task]
entry_finder[task] = entry
heappush(pq, entry)
def remove_task(task):
'Mark an existing task as REMOVED. Raise KeyError if not found.'
entry = entry_finder.pop(task)
entry[-1] = REMOVED
def pop_task():
'Remove and return the lowest priority task. Raise KeyError if empty.'
while pq:
priority, count, task = heappop(pq)
if task is not REMOVED:
del entry_finder[task]
return task
raise KeyError('pop from an empty priority queue'
Which is the most efficient priority queue implementation in Python? And how to implement it?
Answers:
There is no such thing as a "most efficient priority queue implementation" in any language.
A priority queue is all about trade-offs. See http://en.wikipedia.org/wiki/Priority_queue
You should choose one of these two, based on how you plan to use it:
O(log(N))
insertion time and O(1)
(findMin+deleteMin)* time, or
O(1)
insertion time and O(log(N))
(findMin+deleteMin)* time
(* sidenote: the findMin time of most queues is almost always O(1), so
here I mostly mean the deleteMin time can either be O(1) quick if the
insertion time is O(log(N)) slow, or the deleteMin time must be
O(log(N)) slow if the insertion time is O(1) fast. One should note that
both may also be unnecessarily slow like with binary-tree based
priority queues.)
In the latter case, you can choose to implement a priority queue with a Fibonacci heap: http://en.wikipedia.org/wiki/Heap_(data_structure)#Comparison_of_theoretic_bounds_for_variants (as you can see, heapq
which is basically a binary tree, must necessarily have O(log(N))
for both insertion and findMin+deleteMin)
If you are dealing with data with special properties (such as bounded data), then you can achieve O(1)
insertion and O(1)
findMin+deleteMin time. You can only do this with certain kinds of data because otherwise you could abuse your priority queue to violate the O(N log(N))
bound on sorting. vEB trees kind of fall under a similar category, since you have a maximum set size (O(log(log(M))
is not referring to the number of elements, but the maximum number of elements) and thus you cannot circumvent the theoretical O(N log(N))
general-purpose comparison-sorting bound.
To implement any queue in any language, all you need is to define the insert(value)
and extractMin() -> value
operations. This generally just involves a minimal wrapping of the underlying heap; see http://en.wikipedia.org/wiki/Fibonacci_heap to implement your own, or use an off-the-shelf library of a similar heap like a Pairing Heap (a Google search revealed http://svn.python.org/projects/sandbox/trunk/collections/pairing_heap.py )
If you only care about which of the two you referenced are more efficient (the heapq
-based code from http://docs.python.org/library/heapq.html#priority-queue-implementation-notes which you included above, versus Queue.PriorityQueue
), then:
There doesn’t seem to be any easily-findable discussion on the web as to what Queue.PriorityQueue
is actually doing; you would have to source dive into the code, which is linked to from the help documentation: http://hg.python.org/cpython/file/2.7/Lib/Queue.py
224 def _put(self, item, heappush=heapq.heappush):
225 heappush(self.queue, item)
226
227 def _get(self, heappop=heapq.heappop):
228 return heappop(self.queue)
As we can see, Queue.PriorityQueue
is also using heapq
as an underlying mechanism. Therefore they are equally bad (asymptotically speaking). Queue.PriorityQueue
may allow for parallel queries, so I would wager that it might have a very slightly constant-factor more of overhead. But because you know the underlying implementation (and asymptotic behavior) must be the same, the simplest way would simply be to run them on the same large dataset.
(Do note that Queue.PriorityQueue
does not seem to have a way to remove entries, while heapq
does. However this is a double-edged sword: Good priority queue implementations might possibly allow you to delete elements in O(1) or O(log(N)) time, but if you use the remove_task
function you mention, and let those zombie tasks accumulate in your queue because you aren’t extracting them off the min, then you will see asymptotic slowdown which you wouldn’t otherwise see. Of course, you couldn’t do this with Queue.PriorityQueue
in the first place, so no comparison can be made here.)
The version in the Queue module is implemented using the heapq module, so they have equal efficiency for the underlying heap operations.
That said, the Queue version is slower because it adds locks, encapsulation, and a nice object oriented API.
The priority queue suggestions shown in the heapq docs are meant to show how to add additional capabilities to a priority queue (such as sort stability and the ability to change the priority of a previously enqueued task). If you don’t need those capabilities, then the basic heappush and heappop functions will give you the fastest performance.
Although this question has been answered and marked accepted, still here is a simple custom implementation of Priority Queue without using any module to understand how it works.
# class for Node with data and priority
class Node:
def __init__(self, info, priority):
self.info = info
self.priority = priority
# class for Priority queue
class PriorityQueue:
def __init__(self):
self.queue = list()
# if you want you can set a maximum size for the queue
def insert(self, node):
# if queue is empty
if self.size() == 0:
# add the new node
self.queue.append(node)
else:
# traverse the queue to find the right place for new node
for x in range(0, self.size()):
# if the priority of new node is greater
if node.priority >= self.queue[x].priority:
# if we have traversed the complete queue
if x == (self.size()-1):
# add new node at the end
self.queue.insert(x+1, node)
else:
continue
else:
self.queue.insert(x, node)
return True
def delete(self):
# remove the first node from the queue
return self.queue.pop(0)
def show(self):
for x in self.queue:
print str(x.info)+" - "+str(x.priority)
def size(self):
return len(self.queue)
Find the complete code and explanation here: https://www.studytonight.com/post/implementing-priority-queue-in-python (Updated URL)
Sorry for such a silly question but Python docs are confusing…
Link 1: Queue Implementation
http://docs.python.org/library/queue.html
It says that Queue has a class for the priority queue. But I could not find how to implement it.
class Queue.PriorityQueue(maxsize=0)
Link 2: Heap Implementation
http://docs.python.org/library/heapq.html
Here they say that we can implement priority queues indirectly using heapq
pq = [] # list of entries arranged in a heap
entry_finder = {} # mapping of tasks to entries
REMOVED = '<removed-task>' # placeholder for a removed task
counter = itertools.count() # unique sequence count
def add_task(task, priority=0):
'Add a new task or update the priority of an existing task'
if task in entry_finder:
remove_task(task)
count = next(counter)
entry = [priority, count, task]
entry_finder[task] = entry
heappush(pq, entry)
def remove_task(task):
'Mark an existing task as REMOVED. Raise KeyError if not found.'
entry = entry_finder.pop(task)
entry[-1] = REMOVED
def pop_task():
'Remove and return the lowest priority task. Raise KeyError if empty.'
while pq:
priority, count, task = heappop(pq)
if task is not REMOVED:
del entry_finder[task]
return task
raise KeyError('pop from an empty priority queue'
Which is the most efficient priority queue implementation in Python? And how to implement it?
There is no such thing as a "most efficient priority queue implementation" in any language.
A priority queue is all about trade-offs. See http://en.wikipedia.org/wiki/Priority_queue
You should choose one of these two, based on how you plan to use it:
O(log(N))
insertion time andO(1)
(findMin+deleteMin)* time, orO(1)
insertion time andO(log(N))
(findMin+deleteMin)* time
(* sidenote: the findMin time of most queues is almost always O(1), so
here I mostly mean the deleteMin time can either be O(1) quick if the
insertion time is O(log(N)) slow, or the deleteMin time must be
O(log(N)) slow if the insertion time is O(1) fast. One should note that
both may also be unnecessarily slow like with binary-tree based
priority queues.)
In the latter case, you can choose to implement a priority queue with a Fibonacci heap: http://en.wikipedia.org/wiki/Heap_(data_structure)#Comparison_of_theoretic_bounds_for_variants (as you can see, heapq
which is basically a binary tree, must necessarily have O(log(N))
for both insertion and findMin+deleteMin)
If you are dealing with data with special properties (such as bounded data), then you can achieve O(1)
insertion and O(1)
findMin+deleteMin time. You can only do this with certain kinds of data because otherwise you could abuse your priority queue to violate the O(N log(N))
bound on sorting. vEB trees kind of fall under a similar category, since you have a maximum set size (O(log(log(M))
is not referring to the number of elements, but the maximum number of elements) and thus you cannot circumvent the theoretical O(N log(N))
general-purpose comparison-sorting bound.
To implement any queue in any language, all you need is to define the insert(value)
and extractMin() -> value
operations. This generally just involves a minimal wrapping of the underlying heap; see http://en.wikipedia.org/wiki/Fibonacci_heap to implement your own, or use an off-the-shelf library of a similar heap like a Pairing Heap (a Google search revealed http://svn.python.org/projects/sandbox/trunk/collections/pairing_heap.py )
If you only care about which of the two you referenced are more efficient (the heapq
-based code from http://docs.python.org/library/heapq.html#priority-queue-implementation-notes which you included above, versus Queue.PriorityQueue
), then:
There doesn’t seem to be any easily-findable discussion on the web as to what Queue.PriorityQueue
is actually doing; you would have to source dive into the code, which is linked to from the help documentation: http://hg.python.org/cpython/file/2.7/Lib/Queue.py
224 def _put(self, item, heappush=heapq.heappush):
225 heappush(self.queue, item)
226
227 def _get(self, heappop=heapq.heappop):
228 return heappop(self.queue)
As we can see, Queue.PriorityQueue
is also using heapq
as an underlying mechanism. Therefore they are equally bad (asymptotically speaking). Queue.PriorityQueue
may allow for parallel queries, so I would wager that it might have a very slightly constant-factor more of overhead. But because you know the underlying implementation (and asymptotic behavior) must be the same, the simplest way would simply be to run them on the same large dataset.
(Do note that Queue.PriorityQueue
does not seem to have a way to remove entries, while heapq
does. However this is a double-edged sword: Good priority queue implementations might possibly allow you to delete elements in O(1) or O(log(N)) time, but if you use the remove_task
function you mention, and let those zombie tasks accumulate in your queue because you aren’t extracting them off the min, then you will see asymptotic slowdown which you wouldn’t otherwise see. Of course, you couldn’t do this with Queue.PriorityQueue
in the first place, so no comparison can be made here.)
The version in the Queue module is implemented using the heapq module, so they have equal efficiency for the underlying heap operations.
That said, the Queue version is slower because it adds locks, encapsulation, and a nice object oriented API.
The priority queue suggestions shown in the heapq docs are meant to show how to add additional capabilities to a priority queue (such as sort stability and the ability to change the priority of a previously enqueued task). If you don’t need those capabilities, then the basic heappush and heappop functions will give you the fastest performance.
Although this question has been answered and marked accepted, still here is a simple custom implementation of Priority Queue without using any module to understand how it works.
# class for Node with data and priority
class Node:
def __init__(self, info, priority):
self.info = info
self.priority = priority
# class for Priority queue
class PriorityQueue:
def __init__(self):
self.queue = list()
# if you want you can set a maximum size for the queue
def insert(self, node):
# if queue is empty
if self.size() == 0:
# add the new node
self.queue.append(node)
else:
# traverse the queue to find the right place for new node
for x in range(0, self.size()):
# if the priority of new node is greater
if node.priority >= self.queue[x].priority:
# if we have traversed the complete queue
if x == (self.size()-1):
# add new node at the end
self.queue.insert(x+1, node)
else:
continue
else:
self.queue.insert(x, node)
return True
def delete(self):
# remove the first node from the queue
return self.queue.pop(0)
def show(self):
for x in self.queue:
print str(x.info)+" - "+str(x.priority)
def size(self):
return len(self.queue)
Find the complete code and explanation here: https://www.studytonight.com/post/implementing-priority-queue-in-python (Updated URL)