return indices from filtered, sorted array with numpy
Question:
What is the most straightforward way to do the following in python/numpy?
- begin with random array
x
- filter out elements
x < .5
- sort the remaining values by size
- return indices of (original)
x
corresponding to these values
Answers:
One solution:
- created sorted index array (argsort)
- create mask for sorted
x
less than threshold
- apply mask to sorted index array
example:
import numpy as np
# x = np.random.rand(4)
x = np.array([0.96924269, 0.30592608, 0.03338015, 0.64815553])
solution = np.array([2, 1])
sorted_idx = np.argsort(x)
idx_mask = (x[sorted_idx] < 0.5)
sorted_filtered_idx = sorted_idx[idx_mask]
assert np.all(sorted_filtered_idx == solution)
Finding the mask of x < 0.5
and x.argsort()
seemed like compulsory here. Once you have those two, you can sort the mask array using the sort indices and use this mask on the sort indices to get back the indices corresponding to sorted indices that satisfy the masked condition. Thus, you would be adding one more line of code, like so –
mask = x < 0.5
sort_idx = x.argsort()
out = sort_idx[mask[sort_idx]]
Sample step-by-step run –
In [56]: x
Out[56]: array([ 0.8974009 , 0.30127187, 0.71187137, 0.04041124])
In [57]: mask
Out[57]: array([False, True, False, True], dtype=bool)
In [58]: sort_idx
Out[58]: array([3, 1, 2, 0])
In [59]: mask[sort_idx]
Out[59]: array([ True, True, False, False], dtype=bool)
In [60]: sort_idx[mask[sort_idx]]
Out[60]: array([3, 1])
Masked arrays are concise (but maybe not especially efficient)
x = np.random.rand(4);
inverse_mask = x < 0.5
m_x = np.ma.array(x, mask=np.logical_not(inverse_mask))
sorted_indeces = m_x.argsort(fill_value=1)
filtered_sorted_indeces = sorted_indeces[:np.sum(inverse_mask)]
While the original query asks for the "most straightforward" approach, the provided answers don’t filter the array first and then sort only the remainder as posed. Sadly Numpy does not make this easy.
AGN Glazer in this SO post provides:
def meth_agn_v2(x, thresh):
idx, = np.where(x > thresh)
return idx[np.argsort(x[idx])]
but np.where
does not support the axis=
argument so this can’t be generalized to multidimensional arrays like np.argsort
(at least not that I can see).
For a small number of results num
, Ari Cooper-Davis’ answer led me to this result:
sortedPosns = [n for n in range(num)]
b = numpy.argpartition(x, sortedPosns)
The result array b provides the first num
indexes for the input array x
in sorted order, but you have to provide the expected number of sorted entries. The remaining indexes in b will have undefined order.
What is the most straightforward way to do the following in python/numpy?
- begin with random array
x
- filter out elements
x < .5
- sort the remaining values by size
- return indices of (original)
x
corresponding to these values
One solution:
- created sorted index array (argsort)
- create mask for sorted
x
less than threshold - apply mask to sorted index array
example:
import numpy as np
# x = np.random.rand(4)
x = np.array([0.96924269, 0.30592608, 0.03338015, 0.64815553])
solution = np.array([2, 1])
sorted_idx = np.argsort(x)
idx_mask = (x[sorted_idx] < 0.5)
sorted_filtered_idx = sorted_idx[idx_mask]
assert np.all(sorted_filtered_idx == solution)
Finding the mask of x < 0.5
and x.argsort()
seemed like compulsory here. Once you have those two, you can sort the mask array using the sort indices and use this mask on the sort indices to get back the indices corresponding to sorted indices that satisfy the masked condition. Thus, you would be adding one more line of code, like so –
mask = x < 0.5
sort_idx = x.argsort()
out = sort_idx[mask[sort_idx]]
Sample step-by-step run –
In [56]: x
Out[56]: array([ 0.8974009 , 0.30127187, 0.71187137, 0.04041124])
In [57]: mask
Out[57]: array([False, True, False, True], dtype=bool)
In [58]: sort_idx
Out[58]: array([3, 1, 2, 0])
In [59]: mask[sort_idx]
Out[59]: array([ True, True, False, False], dtype=bool)
In [60]: sort_idx[mask[sort_idx]]
Out[60]: array([3, 1])
Masked arrays are concise (but maybe not especially efficient)
x = np.random.rand(4);
inverse_mask = x < 0.5
m_x = np.ma.array(x, mask=np.logical_not(inverse_mask))
sorted_indeces = m_x.argsort(fill_value=1)
filtered_sorted_indeces = sorted_indeces[:np.sum(inverse_mask)]
While the original query asks for the "most straightforward" approach, the provided answers don’t filter the array first and then sort only the remainder as posed. Sadly Numpy does not make this easy.
AGN Glazer in this SO post provides:
def meth_agn_v2(x, thresh):
idx, = np.where(x > thresh)
return idx[np.argsort(x[idx])]
but np.where
does not support the axis=
argument so this can’t be generalized to multidimensional arrays like np.argsort
(at least not that I can see).
For a small number of results num
, Ari Cooper-Davis’ answer led me to this result:
sortedPosns = [n for n in range(num)]
b = numpy.argpartition(x, sortedPosns)
The result array b provides the first num
indexes for the input array x
in sorted order, but you have to provide the expected number of sorted entries. The remaining indexes in b will have undefined order.