Python equivalent of std::set and std::multimap

Question:

I’m porting a C++ program to Python. There are some places where it uses std::set to store objects that define their own comparison operators. Since the Python standard library has no equivalent of std::set (a sorted key-value mapping data structure) I tried using a normal dictionary and then sorting it when iterating, like this:

def __iter__(self):
    items = self._data.items()
    items.sort()
    return iter(items)

However, profiling has shown that all the calls from .sort() to __cmp__ are a serious bottleneck. I need a better data structure – essentially a sorted dictionary. Does anyone know of an existing implementation? Failing that, any recommendations on how I should implement this? Read performance is more important than write performance and time is more important than memory.

Bonus points if it supports multiple values per key, like the C++ std::multimap.

Note that the OrderedDict class doesn’t fit my needs, because it returns items in the order of insertion, whereas I need them sorted using their __cmp__ methods.

Asked By: EMP

||

Answers:

Python does not have built-in data-structures for this, though the bisect module provides functionality for keeping a sorted list with appropriately efficient algorithms.

If you have a list of sorted keys, you can couple it with a collections.defaultdict(list) to provide multimap-like functionality.

Answered By: Mike Graham

In his book “Programming in Python 3“, Mark Summerfield introduces a sorted dictionary class. The source code is available in this zip archive – look for SortedDict.py. The SortedDict class is described in detail in the book (which I recommend very much). It supports arbitrary keys for comparison and multiple values per key (which any dictionary in Python does, so that’s not that big a deal, I think).

Answered By: Tim Pietzcker

You should use sort(key=...).
The key function you use will be related to the cmp you are already using. The advantage is that the key function is called n times whereas the cmp is called nlog n times, and typically key does half the work that cmp does

If you can include your __cmp__() we can probably show you how to convert it to a key function

If you are doing lots of iterations between modifications, you should cache the value of the sorted items.

Answered By: John La Rooy

For the sorted dictionary, you can (ab)use the stable nature of python’s timsort: basically, keep the items partially sorted, append items at the end when needed, switching a “dirty” flag, and sort the remaining before iterating. See this entry for details and implementation (A Martelli’s answer):
Key-ordered dict in Python

Answered By: LeMiz

This is a late post but if anyone is looking for this now, here you go: https://grantjenks.com/docs/sortedcontainers/

This is not a built-in but just an easy pip install. It has sorted dicts and lists both with full support for insert, delete, indexing and binary search. Most of the operations have amortised O(log(n)) complexity.

Answered By: Soumyajit