Sort list by nested tuple values

Question:

Is there a better way to sort a list by a nested tuple values than writing an itemgetter alternative that extracts the nested tuple value:

def deep_get(*idx):
  def g(t):
      for i in idx: t = t[i]
      return t
  return g

>>> l = [((2,1), 1),((1,3), 1),((3,6), 1),((4,5), 2)]
>>> sorted(l, key=deep_get(0,0))
[((1, 3), 1), ((2, 1), 1), ((3, 6), 1), ((4, 5), 2)]
>>> sorted(l, key=deep_get(0,1))
[((2, 1), 1), ((1, 3), 1), ((4, 5), 2), ((3, 6), 1)]

I thought about using compose, but that’s not in the standard library:

sorted(l, key=compose(itemgetter(1), itemgetter(0))

Is there something I missed in the libs that would make this code nicer?

The implementation should work reasonably with 100k items.

Context: I would like to sort a dictionary of items that are a histogram. The keys are a tuples (a,b) and the value is the count. In the end the items should be sorted by count descending, a and b. An alternative is to flatten the tuple and use the itemgetter directly but this way a lot of tuples will be generated.

Asked By: Thomas Jung

||

Answers:

Yes, you could just use a key=lambda x: x[0][1]

Answered By: ninjagecko

Your approach is quite good, given the data structure that you have.

Another approach would be to use another structure.

If you want speed, the de-factor standard NumPy is the way to go. Its job is to efficiently handle large arrays. It even has some nice sorting routines for arrays like yours. Here is how you would write your sort over the counts, and then over (a, b):

>>> arr = numpy.array([((2,1), 1),((1,3), 1),((3,6), 1),((4,5), 2)],
                  dtype=[('pos', [('a', int), ('b', int)]), ('count', int)])
>>> print numpy.sort(arr, order=['count', 'pos'])
[((1, 3), 1) ((2, 1), 1) ((3, 6), 1) ((4, 5), 2)]

This is very fast (it’s implemented in C).

If you want to stick with standard Python, a list containing (count, a, b) tuples would automatically get sorted in the way you want by Python (which uses lexicographic order on tuples).

Answered By: Eric O Lebigot

I compared two similar solutions. The first one uses a simple lambda:

def sort_one(d):
    result = d.items()
    result.sort(key=lambda x: (-x[1], x[0]))
    return result

Note the minus on x[1], because you want the sort to be descending on count.

The second one takes advantage of the fact that sort in Python is stable. First, we sort by (a, b) (ascending). Then we sort by count, descending:

def sort_two(d):
    result = d.items()
    result.sort()
    result.sort(key=itemgetter(1), reverse=True)
    return result

The first one is 10-20% faster (both on small and large datasets), and both complete under 0.5sec on my Q6600 (one core used) for 100k items. So avoiding the creation of tuples doesn’t seem to help much.

Answered By: Dzinx

This might be a little faster version of your approach:

l = [((2,1), 1), ((1,3), 1), ((3,6), 1), ((4,5), 2)]

def deep_get(*idx):
    def g(t):
        return reduce(lambda t, i: t[i], idx, t)
    return g

>>> sorted(l, key=deep_get(0,1))
[((2, 1), 1), ((1, 3), 1), ((4, 5), 2), ((3, 6), 1)]

Which could be shortened to:

def deep_get(*idx):
    return lambda t: reduce(lambda t, i: t[i], idx, t)

or even just simply written-out:

sorted(l, key=lambda t: reduce(lambda t, i: t[i], (0,1), t))
Answered By: martineau
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.