# Sorting a 2D numpy array by multiple axes

## Question:

I have a 2D numpy array of shape (N,2) which is holding N points (x and y coordinates). For example:

``````array([[3, 2],
[6, 2],
[3, 6],
[3, 4],
[5, 3]])
``````

I’d like to sort it such that my points are ordered by x-coordinate, and then by y in cases where the x coordinate is the same. So the array above should look like this:

``````array([[3, 2],
[3, 4],
[3, 6],
[5, 3],
[6, 2]])
``````

If this was a normal Python list, I would simply define a comparator to do what I want, but as far as I can tell, numpy’s sort function doesn’t accept user-defined comparators. Any ideas?

EDIT: Thanks for the ideas! I set up a quick test case with 1000000 random integer points, and benchmarked the ones that I could run (sorry, can’t upgrade numpy at the moment).

``````Mine:   4.078 secs
mtrw:   7.046 secs
unutbu: 0.453 secs
``````

Here’s one way to do it using an intermediate structured array:

``````from numpy import array

a = array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])

b = a.flatten()
b.dtype = [('x', '<i4'), ('y', '<i4')]
b.sort()
b.dtype = '<i4'
b.shape = a.shape

print b
``````

which gives the desired output:

``````[[3 2]
[3 4]
[3 6]
[5 3]
[6 2]]
``````

Not sure if this is quite the best way to go about it though.

I found one way to do it:

``````from numpy import array
a = array([(3,2),(6,2),(3,6),(3,4),(5,3)])
array(sorted(sorted(a,key=lambda e:e[1]),key=lambda e:e[0]))
``````

It’s pretty terrible to have to sort twice (and use the plain python `sorted` function instead of a faster numpy sort), but it does fit nicely on one line.

Using lexsort:

``````import numpy as np
a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])

ind = np.lexsort((a[:,1],a[:,0]))

a[ind]
# array([[3, 2],
#       [3, 4],
#       [3, 6],
#       [5, 3],
#       [6, 2]])
``````

`a.ravel()` returns a view if `a` is `C_CONTIGUOUS`. If that is true,
@ars’s method, slightly modifed by using `ravel` instead of `flatten`, yields a nice way to sort `a` in-place:

``````a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])
dt = [('col1', a.dtype),('col2', a.dtype)]
assert a.flags['C_CONTIGUOUS']
b = a.ravel().view(dt)
b.sort(order=['col1','col2'])
``````

Since `b` is a view of `a`, sorting `b` sorts `a` as well:

``````print(a)
# [[3 2]
#  [3 4]
#  [3 6]
#  [5 3]
#  [6 2]]
``````

You can use `np.complex_sort`. This has the side effect of changing your data to floating point, I hope that’s not a problem:

``````>>> a = np.array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])
>>> atmp = np.sort_complex(a[:,0] + a[:,1]*1j)
>>> b = np.array([[np.real(x), np.imag(x)] for x in atmp])
>>> b
array([[ 3.,  2.],
[ 3.,  4.],
[ 3.,  6.],
[ 5.,  3.],
[ 6.,  2.]])
``````

I was struggling with the same thing and just got help and solved the problem. It works smoothly if your array have column names (structured array) and I think this is a very simple way to sort using the same logic that excel does:

``````array_name[array_name[['colname1','colname2']].argsort()]
``````

Note the double-brackets enclosing the sorting criteria. And off course, you can use more than 2 columns as sorting criteria.

The title says “sorting 2D arrays”. Although the questioner uses an `(N,2)`-shaped array, it’s possible to generalize unutbu’s solution to work with any `(N,M)` array, as that’s what people might actually be looking for.

One could `transpose` the array and use slice notation with negative `step` to pass all the columns to `lexsort` in reversed order:

``````>>> import numpy as np
>>> a = np.random.randint(1, 6, (10, 3))
>>> a
array([[4, 2, 3],
[4, 2, 5],
[3, 5, 5],
[1, 5, 5],
[3, 2, 1],
[5, 2, 2],
[3, 2, 3],
[4, 3, 4],
[3, 4, 1],
[5, 3, 4]])

>>> a[np.lexsort(np.transpose(a)[::-1])]
array([[1, 5, 5],
[3, 2, 1],
[3, 2, 3],
[3, 4, 1],
[3, 5, 5],
[4, 2, 3],
[4, 2, 5],
[4, 3, 4],
[5, 2, 2],
[5, 3, 4]])
``````

The numpy_indexed package (disclaimer: I am its author) can be used to solve these kind of processing-on-nd-array problems in an efficient fully vectorized manner:

``````import numpy_indexed as npi
npi.sort(a)  # by default along axis=0, but configurable
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.