return indices from filtered, sorted array with numpy

Question

What is the most straightforward way to do the following in python/numpy?

begin with random array x
filter out elements x < .5
sort the remaining values by size
return indices of (original) x corresponding to these values

Asked By: anon01

||

Source

Answer 1

One solution:

created sorted index array (argsort)
create mask for sorted x less than threshold
apply mask to sorted index array

example:

import numpy as np

# x = np.random.rand(4)
x = np.array([0.96924269, 0.30592608, 0.03338015, 0.64815553])
solution = np.array([2, 1])

sorted_idx = np.argsort(x)
idx_mask = (x[sorted_idx] < 0.5)
sorted_filtered_idx = sorted_idx[idx_mask]

assert np.all(sorted_filtered_idx == solution)

Answered By: anon01

Answer 2

Finding the mask of x < 0.5 and x.argsort() seemed like compulsory here. Once you have those two, you can sort the mask array using the sort indices and use this mask on the sort indices to get back the indices corresponding to sorted indices that satisfy the masked condition. Thus, you would be adding one more line of code, like so –

mask = x < 0.5
sort_idx = x.argsort()
out = sort_idx[mask[sort_idx]]

Sample step-by-step run –

In [56]: x
Out[56]: array([ 0.8974009 ,  0.30127187,  0.71187137,  0.04041124])

In [57]: mask
Out[57]: array([False,  True, False,  True], dtype=bool)

In [58]: sort_idx
Out[58]: array([3, 1, 2, 0])

In [59]: mask[sort_idx]
Out[59]: array([ True,  True, False, False], dtype=bool)

In [60]: sort_idx[mask[sort_idx]]
Out[60]: array([3, 1])

Answered By: Divakar

Answer 3

Masked arrays are concise (but maybe not especially efficient)

x = np.random.rand(4);

inverse_mask = x < 0.5
m_x = np.ma.array(x, mask=np.logical_not(inverse_mask))
sorted_indeces = m_x.argsort(fill_value=1)
filtered_sorted_indeces = sorted_indeces[:np.sum(inverse_mask)]

Answered By: Waylon Flinn

Answer 4

While the original query asks for the "most straightforward" approach, the provided answers don’t filter the array first and then sort only the remainder as posed. Sadly Numpy does not make this easy.

AGN Glazer in this SO post provides:

def meth_agn_v2(x, thresh):
    idx, = np.where(x > thresh)
    return idx[np.argsort(x[idx])]

but np.where does not support the axis= argument so this can’t be generalized to multidimensional arrays like np.argsort (at least not that I can see).

For a small number of results num, Ari Cooper-Davis’ answer led me to this result:

sortedPosns = [n for n in range(num)]
b = numpy.argpartition(x, sortedPosns)

The result array b provides the first num indexes for the input array x in sorted order, but you have to provide the expected number of sorted entries. The remaining indexes in b will have undefined order.

Answered By: robm

return indices from filtered, sorted array with numpy

Question:

Answers: