How python decides when to call the comparator while sorting?
Question:
I have a small script to print a message whenever the comparator is called
from functools import cmp_to_key
def compare(a, b):
print('comparator called')
return a - b
mylist = [5, 1, 2, 4, 3]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)
mylist = [1, 2, 3, 4, 5]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)
mylist = [5, 4, 3, 2, 1]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)
mylist = [5, 1, 2, 3, 4]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)
Output:
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
You can see that for the same input size the number of times comparators are called are different for the 4 cases.
Can someone help me understand how python decides when or when not to call the comparator?
Also, let me know the best, average and worst case time complexities
Answers:
So, the key function is called for each of the items in the list exactly once. But when you use cmp_to_key
, it wraps your comparator function in an object, from the source code, it is equivalent to:
def cmp_to_key(mycmp):
"""Convert a cmp= function into a key= function"""
class K(object):
__slots__ = ['obj']
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(self.obj, other.obj) >= 0
__hash__ = None
return K
(although note, it is actually implemented in C, if you want to see that implementation, but it is effectively the same)
So, while the key function is called only once for each of the items in the list, the comparison function is called as many times as a comparison occurs (as an aside, the sort algorithm uses only <
comparisons between items, but potentially you might use cmp_to_key
for other things which would require the full rich comparison implementation). CPython uses Timsort, which is a highly tuned, adaptive mergesort, which has a worst-case O(N*log N) behavior (but for nearly sorted data, the best case, it can be O(N))
I have a small script to print a message whenever the comparator is called
from functools import cmp_to_key
def compare(a, b):
print('comparator called')
return a - b
mylist = [5, 1, 2, 4, 3]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)
mylist = [1, 2, 3, 4, 5]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)
mylist = [5, 4, 3, 2, 1]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)
mylist = [5, 1, 2, 3, 4]
sorted_list = sorted(mylist, key=cmp_to_key(compare))
print(sorted_list)
Output:
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
comparator called
[1, 2, 3, 4, 5]
You can see that for the same input size the number of times comparators are called are different for the 4 cases.
Can someone help me understand how python decides when or when not to call the comparator?
Also, let me know the best, average and worst case time complexities
So, the key function is called for each of the items in the list exactly once. But when you use cmp_to_key
, it wraps your comparator function in an object, from the source code, it is equivalent to:
def cmp_to_key(mycmp):
"""Convert a cmp= function into a key= function"""
class K(object):
__slots__ = ['obj']
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(self.obj, other.obj) >= 0
__hash__ = None
return K
(although note, it is actually implemented in C, if you want to see that implementation, but it is effectively the same)
So, while the key function is called only once for each of the items in the list, the comparison function is called as many times as a comparison occurs (as an aside, the sort algorithm uses only <
comparisons between items, but potentially you might use cmp_to_key
for other things which would require the full rich comparison implementation). CPython uses Timsort, which is a highly tuned, adaptive mergesort, which has a worst-case O(N*log N) behavior (but for nearly sorted data, the best case, it can be O(N))