in-place numpy array sorting according to given index

Question:

There are some questions that come close, but I haven’t found a specific answer to this. I’m trying to do some in-place sorting of a numpy 3D array along a given axis. I don’t want simple sorting though, I want to resort the array according to my own index. For example

a = np.random.rand((3,3,3))

and let’s say I want to resort the last dimension according to the following indices of the old array:

new_order = [1,2,0]

I would expect to be able to say:

a[:,:,new_order] = a

but this does not behave as expected. Suggestions?

Asked By: James

||

Answers:

Here you are,

a = a[:, :, new_order]

Also, here are a couple ‘numpy for Matlab users’ pages that I found useful when I was getting started:

Link

http://mathesaurus.sourceforge.net/matlab-numpy.html

Answered By: farenorth

np.ndarray.sort is the only sort that claims to be inplace, and it does not give you much control.

Placing the order index on the right works – but can give unpredictable results. Evidently it is doing some sort of sequential assignment, and an earlier assignment on the left can affect values on the right.

In [719]: a=np.arange(12).reshape(3,4)
In [720]: a[:,[0,1,3,2]]=a
In [721]: a
Out[721]: 
array([[ 0,  1,  2,  2],
       [ 4,  5,  6,  6],
       [ 8,  9, 10, 10]])

To do this sort of assignment predictably requires some sort of buffering.

In [728]: a[:,[0,1,3,2]]=a.copy()
In [729]: a
Out[729]: 
array([[ 0,  1,  3,  2],
       [ 4,  5,  7,  6],
       [ 8,  9, 11, 10]])

Indexing of the right gets around this, but this is not in-place. The variable a points to a new object.

In [731]: a=a[:,[0,1,3,2]]
In [732]: a
Out[732]: 
array([[ 0,  1,  3,  2],
       [ 4,  5,  7,  6],
       [ 8,  9, 11, 10]])

However assignment with [:] may solve this:

In [738]: a=np.arange(12).reshape(3,4)
In [739]: a.__array_interface__
Out[739]: 
{'data': (181868592, False),   # 181... is the id of the data buffer
 'descr': [('', '<i4')],
 'shape': (3, 4),
 'strides': None,
 'typestr': '<i4',
 'version': 3}
In [740]: a[:]=a[:,[0,1,3,2]]
In [741]: a.__array_interface__
Out[741]: 
{'data': (181868592, False),  # same data buffer
 'descr': [('', '<i4')],
 'shape': (3, 4),
 'strides': None,
 'typestr': '<i4',
 'version': 3}
In [742]: a
Out[742]: 
array([[ 0,  1,  3,  2],
       [ 4,  5,  7,  6],
       [ 8,  9, 11, 10]])

The fact that the a.data id is the same indicates that this is an inplace action. But it would be good to test this with other indexing to make sure it does what you want.

But, is ‘inplace’ sorting necessary? If the array is very large it might be needed to avoid memory errors. But we’d have to test the alternatives to see if they work.

inplace matters also if there is some other variable that uses the same data. For example

b = a.T # a transpose

With a[:]= the rows of b will be reordered. a and b continue to share the same data. With a=, b is unchanged. a and b are now decoupled.

Answered By: hpaulj

Unfortunately, numpy does not have a builtin solution for this. The only way is to either use some clever assignments or to write your own custom method.

Using cycle detection, an additional set for remembering indices and an auxiliary array for caching the axis, I wrote a custom method for this that should be usefull for reordering large ndarrays:

import numpy as np

def put_at(index, axis=-1, slc=(slice(None),)):
    """Gets the numpy indexer for the given index based on the axis."""
    return (axis < 0)*(Ellipsis,) + axis*slc + (index,) + (-1-axis)*slc


def reorder_inplace(array, new_order, axis=0):
    """
    Reindex (reorder) the array along an axis.

    :param array: The array to reindex.
    :param new_order: A list with the new index order. Must be a valid permutation.
    :param axis: The axis to reindex.
    """
    if np.size(array, axis=axis) != len(new_order):
        raise ValueError(
            'The new order did not match indexed array along dimension %{0}; '
            'dimension is %{1} but corresponding boolean dimension is %{2}'.format(
                axis, np.size(array, axis=axis), len(new_order)
            )
        )

    visited = set()
    for index, source in enumerate(new_order):
        if index not in visited and index != source:
            initial_values = np.take(array, index, axis=axis).copy()

            destination = index
            visited.add(destination)
            while source != index:
                if source in visited:
                    raise IndexError(
                        'The new order is not unique; '
                        'duplicate found at position %{0} with value %{1}'.format(
                            destination, source
                        )
                    )

                array[put_at(destination, axis=axis)] = array.take(source, axis=axis)

                destination = source
                source = new_order[destination]

                visited.add(destination)
            array[put_at(destination, axis=axis)] = initial_values

Example:

In[4]: a = np.arange(15).reshape(3, 5)
In[5]: a
Out[5]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Reorder on axis 0:

In[6]: reorder_inplace(a, [2, 0, 1], axis=0)
In[7]: a
Out[7]: 
array([[10, 11, 12, 13, 14],
       [ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

Reorder on axis 1:

In[10]: reorder_inplace(a, [3, 2, 0, 4, 1], axis=1)
In[11]: a
Out[11]: 
array([[ 3,  2,  0,  4,  1],
       [ 8,  7,  5,  9,  6],
       [13, 12, 10, 14, 11]]

Timing and memory for small array of 1000 x 1000

In[5]: a = np.arange(1000 * 1000).reshape(1000, 1000)
In[6]: %timeit reorder_inplace(a, np.random.permutation(1000))
8.19 ms ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In[7]: %memit reorder_inplace(a, np.random.permutation(1000))
peak memory: 81.75 MiB, increment: 0.49 MiB
In[8]: %timeit a[:] = a[np.random.permutation(1000), :]
3.27 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In[9]: %memit a[:] = a[np.random.permutation(1000), :]
peak memory: 89.56 MiB, increment: 0.01 MiB

For small array, the memory consumption is not very different, but the numpy version is much faster.

Timing and memory for 20000 x 20000

In[5]: a = np.arange(20000 * 20000).reshape(20000, 20000)
In[6]: %timeit reorder_inplace(a, np.random.permutation(20000))
1.16 s ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In[7]: %memit reorder_inplace(a, np.random.permutation(20000))
peak memory: 3130.77 MiB, increment: 0.19 MiB
In[8]: %timeit a[:] = a[np.random.permutation(20000), :]
1.84 s ± 2.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In[9]: %memit a[:] = a[np.random.permutation(20000), :]
peak memory: 6182.80 MiB, increment: 3051.76 MiB

When the size of the array increases by a notch, the numpy version becomes much slower. The memory consumption for the numpy version is also very high. The custom inplace reordering uses a negligible amount.

Answered By: Vali Rosca
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.