Sort by order of another list, where that list is nested with 2 sort keys, using Python

Question:

I know how to sort one list to match the order of another list. I also know how to sort one list on two keys (using a lamdba function like key = lambda i: (i[0], i[1])). But I need to sort one list to match the order of another list, where that second list has two sort keys. For example:

order_list = [['a', 2], ['c', 3], ['b', 1], ['e', 4]]
listB = [['c', 3, 'red', 'car'], ['e', 4, 'green', 'bus'], ['b', 1, 'blue', 'bike'], ['a', 2, 'yellow', 'plane']]

Desired output:

sorted_listB = [['a', 2, 'yellow', 'plane'], ['c', 3, 'red', 'car'], ['b', 1, 'blue', 'bike'],['e', 4, 'green', 'bus']]

I tried by writing this – even though it’s bad form I just wanted to see if it would work and it does not:

def sort_key(x):
    """ Find the matching element in reference sorted list
    """
    # bad form, referencing non-local var
    for a in order_list:
        if a[0] == x[0] and a[1] == x[1]:
            return a

    sorted_listB = sorted(listB, key = sort_key)

Any clever thoughts on how to do this? Preferably without turning the nested list into a single key. I know I can do that…was trying to stretch my skills and do it this way, but I’m stuck.

Asked By: Jess

||

Answers:

One approach:

order_list = [['a', 2], ['c', 3], ['b', 1], ['e', 4]]
listB = [['c', 3, 'red', 'car'], ['e', 4, 'green', 'bus'], ['b', 1, 'blue', 'bike'], ['a', 2, 'yellow', 'plane']]

# use a dictionary to map the values to the priorities
keys = {tuple(e): i for i, e in enumerate(order_list)}

# then use the first to elements of each sub-list to check for the priority
sorted_listB = sorted(listB, key=lambda x: keys.get(tuple(x[:2])))
print(sorted_listB)
print(sorted_listB)

Output

[['a', 2, 'yellow', 'plane'], ['c', 3, 'red', 'car'], ['b', 1, 'blue', 'bike'], ['e', 4, 'green', 'bus']]

Or, if you wish, fix your function to return index values using enumerate as below:

def sort_key(x):
    """ Find the matching element in reference sorted list
    """
    # bad form, referencing non-local var
    for i, a in enumerate(order_list):
        if a[0] == x[0] and a[1] == x[1]:
            return i
Answered By: Dani Mesejo

Per external discussion: the general form of this question is a common one, but I cannot easily find a proper duplicate at the moment. There is a trap here for duplicate closers: the popular question Sorting list based on values from another list sounds right, but describes a completely different problem (arguably the title is a bit inaccurate). I am still trying to identify proper canonicals, both for the specific problem of "base the sorting on the position of a related element in another list" – which comes up a lot! – and the general idea of how sorting "by key" works in Python. In the mean time, I’ll contribute an answer, expecting to move this content somewhere else later.


Sorting by key in Python

The general idea is that, when we write sorted(a_sequence, key=my_func), Python will give us elements from a_sequence sorted according to the results of my_func. That is to say:

  • For each element e in a_sequence, my_func(e) is computed.
  • The "natural" sorting order of the my_func(e) results is determined.
  • The original e values are swapped around in the same way that the my_func(e) results would need to be swapped.

So what’s wrong with the attempted key-function here?

Presuming that the sorted call is correctly placed outside of the key-function that it will use, our key-function (with the above names) looks like:

def my_func(e):
    for a in order_list:
        if a[0] == e[0] and a[1] == e[1]:
            return a

The issue here is what gets returned. After verifying that the e element matches one of the external list’s values, we return… that value. Which will then sort naturally: i.e., ['b', 1] comes before ['c', 3] normally, so the value "keyed with" ['b', 1] will come before the one keyed with ['c', 3] in the sort-by-key (just as it would normally, since we ended up just taking a prefix of the elements).

Instead, our idea is that the element matching (i.e., having a prefix) ['c', 3] should come before ['b', 1], because it appears earlier in the order_list. In other words, the position within the order_list is what we want to use for the key.

So, most directly, we could modify the loop to keep track of that position, in order to return it:

def my_func(e):
    for i, a in enumerate(order_list):
        if a[0] == e[0] and a[1] == e[1]:
            return i

Simplification

Of course, it would be better to use built-in tools more directly, rather than having to write a loop explicitly. The first thing we notice is that we don’t really have a complex comparison – we just care about whether a list of the first two elements of e – a simple slice of e – equals a:

def my_func(e):
    for i, a in enumerate(order_list):
        if e[:2] == a:
            return i

Looking at it this way, we can see more clearly that what we are doing is finding the index of e[:2] within the order_list. And, of course, we know that this is built-in:

def my_func(e):
    return order_list.index(e[:2])

Finally, since our key-function is now a one-liner, we can replace it with a lambda (which should also help feel better about the dependency on the outer-scope order_list, since it will obviously be in scope at the point where we attempt sorting):

sorted(listB, key=lambda e:order_list.index(e[:2]))

An important performance consideration

For this specific task, however, we may get poor performance unnecessarily, depending on the order_list. The issue is that .index will have to do a linear search of order_list, for each element of the listB.

To avoid this, we can build a direct lookup first. We want a dictionary that maps from the order_list elements into their corresponding .index results, which only requires iterating once; then we can directly look up the indexes in the key-function, using fast dict lookup.

With the given input, there is a slight complication: lists aren’t hashable, so we can’t use them as dict keys. We’ll work around this by storing tuples instead, and making that conversion in the key-function as well.

Building the lookup dict is easy using a dict comprehension:

lookup = {tuple(e): i for i, e in enumerate(order_list)}

Then we want our lambda to slice, convert to tuple, and do dict lookup:

sorted_listB = sorted(listB, key=lambda e: lookup[tuple(e[:2])])

But what if the order_list is missing values?

All of the above approaches, so far, are lacking proper error handling. In the original fixed key function, with an explicit loop, None would be returned if some element didn’t match any of the order_list entries. In Python 3.x, this would generally cause an exception:

>>> bad_list = [['a', 2, 'yellow', 'plane'], ['x', 0, 'invalid', 'element']]
>>> sorted(bad_list, key=my_func)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'NoneType' and 'int'

Similarly, the .index-based key will immediately fail:

>>> sorted(bad_list, key=lambda e:order_list.index(e[:2]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
ValueError: ['x', 0] is not in list

And dict lookup will fail, too:

>>> sorted(bad_list, key=lambda e: lookup[tuple(e[:2])])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
KeyError: ('x', 0)

Using .get can help in the latter case, but we need to default to some integer value directly – otherwise, we still have the comparison problem later:

>>> sorted(bad_list, key=lambda e: lookup.get(tuple(e[:2])))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'NoneType' and 'int'
>>> sorted(bad_list, key=lambda e: lookup.get(tuple(e[:2]), len(order_list)))
[['a', 2, 'yellow', 'plane'], ['x', 0, 'invalid', 'element']]

Why len(order_list)? By making it a value greater than any of the valid indices, we ensure non-matching elements go to the end. This is the simplest way to ensure the value is an integer greater than the valid indices.

Fixing the problem for a direct .index based approach requires explicit exception handling, so it won’t neatly fit in a lambda. Fixing the original version with an explicit loop is just a matter of explicitly returning an appropriate value after the loop (left as an exercise).

Answered By: Karl Knechtel
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.