return indices from filtered, sorted array with numpy

Question:

What is the most straightforward way to do the following in python/numpy?

  • begin with random array x
  • filter out elements x < .5
  • sort the remaining values by size
  • return indices of (original) x corresponding to these values
Asked By: anon01

||

Answers:

One solution:

  • created sorted index array (argsort)
  • create mask for sorted x less than threshold
  • apply mask to sorted index array

example:

import numpy as np

# x = np.random.rand(4)
x = np.array([0.96924269, 0.30592608, 0.03338015, 0.64815553])
solution = np.array([2, 1])

sorted_idx = np.argsort(x)
idx_mask = (x[sorted_idx] < 0.5)
sorted_filtered_idx = sorted_idx[idx_mask]

assert np.all(sorted_filtered_idx == solution)
Answered By: anon01

Finding the mask of x < 0.5 and x.argsort() seemed like compulsory here. Once you have those two, you can sort the mask array using the sort indices and use this mask on the sort indices to get back the indices corresponding to sorted indices that satisfy the masked condition. Thus, you would be adding one more line of code, like so –

mask = x < 0.5
sort_idx = x.argsort()
out = sort_idx[mask[sort_idx]]

Sample step-by-step run –

In [56]: x
Out[56]: array([ 0.8974009 ,  0.30127187,  0.71187137,  0.04041124])

In [57]: mask
Out[57]: array([False,  True, False,  True], dtype=bool)

In [58]: sort_idx
Out[58]: array([3, 1, 2, 0])

In [59]: mask[sort_idx]
Out[59]: array([ True,  True, False, False], dtype=bool)

In [60]: sort_idx[mask[sort_idx]]
Out[60]: array([3, 1])
Answered By: Divakar

Masked arrays are concise (but maybe not especially efficient)

x = np.random.rand(4);

inverse_mask = x < 0.5
m_x = np.ma.array(x, mask=np.logical_not(inverse_mask))
sorted_indeces = m_x.argsort(fill_value=1)
filtered_sorted_indeces = sorted_indeces[:np.sum(inverse_mask)]
Answered By: Waylon Flinn

While the original query asks for the "most straightforward" approach, the provided answers don’t filter the array first and then sort only the remainder as posed. Sadly Numpy does not make this easy.

AGN Glazer in this SO post provides:

def meth_agn_v2(x, thresh):
    idx, = np.where(x > thresh)
    return idx[np.argsort(x[idx])]

but np.where does not support the axis= argument so this can’t be generalized to multidimensional arrays like np.argsort (at least not that I can see).

For a small number of results num, Ari Cooper-Davis’ answer led me to this result:

sortedPosns = [n for n in range(num)]
b = numpy.argpartition(x, sortedPosns)

The result array b provides the first num indexes for the input array x in sorted order, but you have to provide the expected number of sorted entries. The remaining indexes in b will have undefined order.

Answered By: robm