Time complexity iterating over items in a SortedDict?

Question:

from sortedcontainers import SortedDict

d = SortedDict(b=20, d=30, c=10, e=50, a=40)

# What is the time complexity of the following code?
for k, v in d.items():
    print(k, v)

I think the time complexity should be nlog(n) as getting an entry from a sorted dictionary costs log(n), and even though we’re iterating over this dictionary, we essentially perform the get operation n times. Is my understanding correct?

Asked By: Jeff Xi

||

Answers:

Any for loop iteraing through objects has a complexity of O(n), when working with dictionaries this is the same. For each of the dictionary items, you loop once.

Answered By: Alvaro Lopez

If I’m understanding the code correctly, you can iterate over the items of a SortedDict in O(n).

Internally, it uses a SortedList, which can iterate over all its elements in O(n) time. (SortedList is implemented as a list of lists, and it uses itertools.chain_iterable() to turn that into a single generator.) Once it identifies the correct item to access, it can look it up in a hash table, same as an ordinary dict. (The source code says "sorted dict inherits from dict to store items and maintains a sorted list of keys.")

How can this be possible, when any sorting algorithm based on comparisons must take least O(n log n)? When inserting into the SortedDict, that can take O(log n), since that’s what the SortedList takes for insertion. So inserting n items takes O(n log n), but iterating over them is only O(n).

Answered By: Nick ODell

SortedDict.items() calls SortedItemsView(self), the constructor of which is inherited from collections.abc.MappingView via collections.abc.ItemsView, and ItemsView has the following special method:

def __iter__(self):
    for key in self._mapping:
        yield (key, self._mapping[key])

So you’re right that it is doing a lookup at each step. Here, self._mapping is a SortedDict. However, since SortedDict is a subclass of dict that does not override __getitem__, it uses the standard dict.__getitem__, which is O(1) on average, better than O(log n).

Also note that the for key in self._mapping: above calls sortedDict.__iter__, which calls SortedList.__iter__, which calls iterools.chain.from_iterable, which runs in linear time.

Answered By: Dennis