How python decides when to call the comparator while sorting?

Question:

I have a small script to print a message whenever the comparator is called

from functools import cmp_to_key

def compare(a, b):
  print('comparator called')
  return a - b
  
mylist = [5, 1, 2, 4, 3]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)


mylist = [1, 2, 3, 4, 5]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)


mylist = [5, 4, 3, 2, 1]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)


mylist = [5, 1, 2, 3, 4]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)

Output:

comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]

You can see that for the same input size the number of times comparators are called are different for the 4 cases.

Can someone help me understand how python decides when or when not to call the comparator?

Also, let me know the best, average and worst case time complexities

Asked By: Saif

||

Answers:

Python’s default sort is a Tim sort, which is a combination of both merge sort and insertion sort.

The code is here.

More info about the sorting algorithm here

Complexity:

  • Worst & Average case: O(n log n)

  • Best case: It occurs when there is no sorting required, O(n)

Answered By: Daniel Trugman

So, the key function is called for each of the items in the list exactly once. But when you use cmp_to_key, it wraps your comparator function in an object, from the source code, it is equivalent to:

def cmp_to_key(mycmp):
    """Convert a cmp= function into a key= function"""
    class K(object):
        __slots__ = ['obj']
        def __init__(self, obj):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        __hash__ = None
    return K

(although note, it is actually implemented in C, if you want to see that implementation, but it is effectively the same)

So, while the key function is called only once for each of the items in the list, the comparison function is called as many times as a comparison occurs (as an aside, the sort algorithm uses only < comparisons between items, but potentially you might use cmp_to_key for other things which would require the full rich comparison implementation). CPython uses Timsort, which is a highly tuned, adaptive mergesort, which has a worst-case O(N*log N) behavior (but for nearly sorted data, the best case, it can be O(N))

Answered By: juanpa.arrivillaga
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.