How to write sort key functions for descending values?

Question:

The move in recent versions of Python to passing a key function to sort() from the previous cmp function is making it trickier for me to perform complex sorts on certain objects.

For example, I want to sort a set of objects from newest to oldest, with a set of string tie-breaker fields. So I want the dates in reverse order but the strings in their natural order. With a comparison function I can just reverse the comparison for the date field compared to the string fields. But with a key function I need to find some way to invert/reverse either the dates or the strings.

It’s easy (although ugly) to do with numbers – just subtract them from something – but do I have to find a similar hack for dates (subtract them from another date and compare the timedeltas?) and strings (…I have no idea how I’d reverse their order in a locale-independent way).

I know of the existence of functools.cmp_to_key() but it is described as being “primarily used as a transition tool for programs being converted to Python 3 where comparison functions are no longer supported”. This implies that I should be able to do what I want with the key method – but how?

Asked By: Kylotan

||

Answers:

Sort twice, once on each key and once reversed.

(Python sort is stable; that is, it doesn’t change the order of the original list unless it has to.)

It does matter which order you do the sorts in, if you care about how equal elements get sorted.

Answered By: Katriel

The most generic way to do this is simply to sort separately by each key in turn. Python’s sorting is always stable so it is safe to do this:

sort(data, key=tiebreakerkey)
sort(data, key=datekey, reverse=True)

will (assuming the relevant definitions for the key functions) give you the data sorted by descending date and ascending tiebreakers.

Note that doing it this way is slower than producing a single composite key function because you will end up doing two complete sorts, so if you can produce a composite key that will be better, but splitting it out into separate sorts gives a lot of flexibility: given a key function for each column you can make any combination of them and specify reverse for any individual column.

For a completely generic option:

keys = [ (datekey, True), (tiebreakerkey, False) ]
for key, rev in reversed(keys):
    sort(data, key=key, reverse=rev)

and for completeness, though I really think it should be avoided where possible:

from functools import cmp_to_key
sort(data, key=cmp_to_key(your_old_comparison_function))

The reason I think you should avoid this you go back to having n log n calls to the comparison function compared with n calls to the key function (or 2n calls when you do the sorts twice).

Answered By: Duncan

I think the docs are incomplete. I interpret the word “primarily” to mean that there are still reasons to use cmp_to_key, and this is one of them. cmp was removed because it was an “attractive nuisance:” people would gravitate to it, even though key was a better choice.

But your case is clearly better as a cmp function, so use cmp_to_key to implement it.

Answered By: Ned Batchelder

The slow-but-elegant way to do this is to create a value wrapper that has reversed ordering:

from functools import total_ordering
@total_ordering
class ReversedOrder:
    def __init__(self, value):
        self.value = value
    def __eq__(self, other):
        return other.value == self.value
    def __lt__(self, other):
        return other.value < self.value

If you don’t have functools.total_ordering, you’d have to implement all 6 comparisons, e.g.:

import operator
class ReversedOrder:
    def __init__(self, value):
        self.value = value
for x in ['__lt__', '__le__', '__eq__', '__ne__', '__ge__', '__gt__']:
    op = getattr(operator, x)
    setattr(ReversedOrder, x, lambda self, other, op=op: op(other.value, self.value))
Answered By: ecatmur

For String, you can use some commonly acknowledged maximum value (such as 2^16 or 2^32) and use chr(), unicode(), ord() to do the math, just like for integers.

In one of my work, I know I deal with strings in utf8 and their ordinals are below 0xffff, so I wrote:

def string_inverse(s):
    inversed_string = ''
    max_char_val = 0xffff
    for c in s:
        inversed_string += unicode(max_char_val-ord(c))
    return inversed_string        

result.sort(key=lambda x:(x[1], string_inverse(x[0])), reverse=True)

x is of type: (string, int), so what I get is, to abuse the SQL:

select * from result order by x[1] desc, x[0] asc;
Answered By: Kun Wu

One way is to use pandas library and args ascending, setting the columns you want to sort ascending and the columns you want descending by doing e.g. ascending=[True,False,False]

You can do that not only for two levels (e.g. datetime and str) but to any number of levels needed.

For example, if you have

d = [[1, 2, datetime(2017,1,2)], 
     [2, 2, datetime(2017,1,4)],
     [2, 3, datetime(2017,1,3)],
     [2, 3, datetime(2017,1,4)], 
     [2, 3, datetime(2017,1,5)], 
     [2, 4, datetime(2017,1,1)], 
     [3, 1, datetime(2017,1,2)]]

You can setup your df

df = pd.DataFrame(d)

and use sort_values

sorted_df = df.sort_values(by=[0,1,2], ascending=[True,False,False])
sorted_list = sorted_df.agg(list, 1).tolist()


[[1, 2, Timestamp('2017-01-02 00:00:00')],
 [2, 4, Timestamp('2017-01-01 00:00:00')],
 [2, 3, Timestamp('2017-01-05 00:00:00')],
 [2, 3, Timestamp('2017-01-04 00:00:00')],
 [2, 3, Timestamp('2017-01-03 00:00:00')],
 [2, 2, Timestamp('2017-01-04 00:00:00')],
 [3, 1, Timestamp('2017-01-02 00:00:00')]]

Notice that the first column is sorted ascending, and the second and third are descending, which is of course due to setting ascending=[True,False,False].

Answered By: rafaelc

try this:

>>> import functools
>>> reverse_key = functools.cmp_to_key(lambda a, b: (a < b) - (a > b))
>>> reverse_key(3) < reverse_key(4)
False
>>> reverse_key(3) > reverse_key(4)
True
>>> reverse_key('a') < reverse_key('b')
False
Answered By: Yankai Zhang
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.