# How do I get indices of N maximum values in a NumPy array?

## Question:

NumPy proposes a way to get the index of the maximum value of an array via `np.argmax`.

I would like a similar thing, but returning the indexes of the `N` maximum values.

For instance, if I have an array, `[1, 3, 2, 4, 5]`, then `nargmax(array, n=3)` would return the indices `[4, 3, 1]` which correspond to the elements `[5, 4, 3]`.

The simplest I’ve been able to come up with is:

``````>>> import numpy as np
>>> arr = np.array([1, 3, 2, 4, 5])
>>> arr.argsort()[-3:][::-1]
array([4, 3, 1])
``````

This involves a complete sort of the array. I wonder if `numpy` provides a built-in way to do a partial sort; so far I haven’t been able to find one.

If this solution turns out to be too slow (especially for small `n`), it may be worth looking at coding something up in Cython.

This will be faster than a full sort depending on the size of your original array and the size of your selection:

``````>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...
>>> B
array([0, 2, 3])
``````

It, of course, involves tampering with your original array. Which you could fix (if needed) by making a copy or replacing back the original values. …whichever is cheaper for your use case.

Use:

``````>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]
``````

For regular Python lists:

``````>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]
``````

If you use Python 2, use `xrange` instead of `range`.

Source: heapq — Heap queue algorithm

Newer NumPy versions (1.8 and up) have a function called `argpartition` for this. To get the indices of the four largest elements, do

``````>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])

>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])

>>> top4 = a[ind]
>>> top4
array([4, 9, 6, 9])
``````

Unlike `argsort`, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating `a[ind]`. If you need that too, sort them afterwards:

``````>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])
``````

To get the top-k elements in sorted order in this way takes O(n + k log k) time.

Simpler yet:

``````idx = (-arr).argsort()[:n]
``````

where n is the number of maximum values.

Use:

``````from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))
``````

Now the `result` list would contain N tuples (`index`, `value`) where `value` is maximized.

If you don’t care about the order of the K-th largest elements you can use `argpartition`, which should perform better than a full sort through `argsort`.

``````K = 4 # We want the indices of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
np.argpartition(a,-K)[-K:]
array([4, 1, 5, 6])
``````

Credits go to this question.

I ran a few tests and it looks like `argpartition` outperforms `argsort` as the size of the array and the value of K increase.

If you happen to be working with a multidimensional array then you’ll need to flatten and unravel the indices:

``````def largest_indices(ary, n):
"""Returns the n largest indices from a numpy array."""
flat = ary.flatten()
indices = np.argpartition(flat, -n)[-n:]
indices = indices[np.argsort(-flat[indices])]
return np.unravel_index(indices, ary.shape)
``````

For example:

``````>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
[ 0.14112001, -0.7568025 , -0.95892427],
[-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])
``````

For multidimensional arrays you can use the `axis` keyword in order to apply the partitioning along the expected axis.

``````# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]
``````

And for grabbing the items:

``````x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)
``````

But note that this won’t return a sorted result. In that case you can use `np.argsort()` along the intended axis:

``````indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)
``````

Here is an example:

``````In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
[16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
[ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
[14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
[ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
[14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
[ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
[11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
[ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
[ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
[2, 7, 5, 9, 6, 8, 1, 0, 4],
[5, 8, 1, 9, 7, 3, 6, 2, 4],
[4, 5, 2, 6, 3, 9, 0, 8, 7],
[7, 2, 6, 4, 1, 3, 8, 5, 9],
[2, 3, 5, 7, 6, 4, 0, 9, 1],
[4, 3, 0, 7, 8, 5, 1, 2, 9],
[5, 2, 0, 8, 4, 6, 3, 1, 9],
[0, 1, 9, 4, 3, 7, 5, 2, 6],
[0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
[1, 0, 4],
[6, 2, 4],
[0, 8, 7],
[8, 5, 9],
[0, 9, 1],
[1, 2, 9],
[3, 1, 9],
[5, 2, 6],
[9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
[16, 16, 18],
[13, 15, 18],
[14, 18, 19],
[16, 18, 19],
[14, 14, 19],
[15, 18, 19],
[16, 17, 19],
[ 9, 14, 14],
[12, 15, 18]])
``````

I found it most intuitive to use `np.unique`.

The idea is, that the unique method returns the indices of the input values. Then from the max unique value and the indicies, the position of the original values can be recreated.

``````multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]
``````

Method `np.argpartition` only returns the k largest indices, performs a local sort, and is faster than `np.argsort`(performing a full sort) when array is quite large. But the returned indices are NOT in ascending/descending order. Let’s say with an example:

We can see that if you want a strict ascending order top k indices, `np.argpartition` won’t return what you want.

Apart from doing a sort manually after np.argpartition, my solution is to use PyTorch, `torch.topk`, a tool for neural network construction, providing NumPy-like APIs with both CPU and GPU support. It’s as fast as NumPy with MKL, and offers a GPU boost if you need large matrix/vector calculations.

Strict ascend/descend top k indices code will be:

Note that `torch.topk` accepts a torch tensor, and returns both top k values and top k indices in type `torch.Tensor`. Similar with np, torch.topk also accepts an axis argument so that you can handle multi-dimensional arrays/tensors.

Use:

``````def max_indices(arr, k):
'''
Returns the indices of the k first largest elements of arr
(in descending order in values)
'''
assert k <= arr.size, 'k should be smaller or equal to the array size'
arr_ = arr.astype(float)  # make a copy of arr
max_idxs = []
for _ in range(k):
max_element = np.max(arr_)
if np.isinf(max_element):
break
else:
idx = np.where(arr_ == max_element)
max_idxs.append(idx)
arr_[idx] = -np.inf
return max_idxs
``````

It also works with 2D arrays. For example,

``````In [0]: A = np.array([[ 0.51845014,  0.72528114],
[ 0.88421561,  0.18798661],
[ 0.89832036,  0.19448609],
[ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
[(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
(array([1], dtype=int64), array([0], dtype=int64)),
(array([0], dtype=int64), array([1], dtype=int64)),
(array([0], dtype=int64), array([0], dtype=int64)),
(array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
(array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])
``````

I think the most time efficiency way is manually iterate through the array and keep a k-size min-heap, as other people have mentioned.

And I also come up with a brute force approach:

``````top_k_index_list = [ ]
for i in range(k):
top_k_index_list.append(np.argmax(my_array))
my_array[top_k_index_list[-1]] = -float('inf')
``````

Set the largest element to a large negative value after you use argmax to get its index. And then the next call of argmax will return the second largest element.
And you can log the original value of these elements and recover them if you want.

The following is a very easy way to see the maximum elements and its positions. Here `axis` is the domain; `axis` = 0 means column wise maximum number and `axis` = 1 means row wise max number for the 2D case. And for higher dimensions it depends upon you.

``````M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))
``````

This code works for a numpy 2D matrix array:

``````mat = np.array([[1, 3], [2, 5]]) # numpy matrix

n = 2  # n
n_largest_mat = np.sort(mat, axis=None)[-n:] # n_largest
tf_n_largest = np.zeros((2,2), dtype=bool) # all false matrix
for x in n_largest_mat:
tf_n_largest = (tf_n_largest) | (mat == x) # true-false

n_largest_elems = mat[tf_n_largest] # true-false indexing
``````

This produces a true-false n_largest matrix indexing that also works to extract n_largest elements from a matrix array

## Three Answers Compared For Coding Ease And Speed

Speed was important for my needs, so I tested three answers to this question.

Code from those three answers was modified as needed for my specific case.

I then compared the speed of each method.

Coding wise:

1. NPE’s answer was the next most elegant and adequately fast for my needs.
2. Fred Foos answer required the most refactoring for my needs but was the fastest. I went with this answer, because even though it took more work, it was not too bad and had significant speed advantages.
3. off99555’s answer was the most elegant, but it is the slowest.

### Complete Code for Test and Comparisons

``````import numpy as np
import time
import random
import sys
from operator import itemgetter
from heapq import nlargest

''' Fake Data Setup '''
a1 = list(range(1000000))
random.shuffle(a1)
a1 = np.array(a1)

''' ################################################ '''
''' NPE's Answer Modified A Bit For My Case '''
t0 = time.time()
indices = np.flip(np.argsort(a1))[:5]
results = []
for index in indices:
results.append((index, a1[index]))
t1 = time.time()
print(results)
print(t1 - t0)
print()

''' Fred Foos Answer Modified A Bit For My Case'''
t0 = time.time()
indices = np.argpartition(a1, -6)[-5:]
results = []
for index in indices:
results.append((a1[index], index))
results.sort(reverse=True)
results = [(b, a) for a, b in results]
t1 = time.time()
print(results)
print(t1 - t0)
print()

''' off99555's Answer - No Modification Needed For My Needs '''
t0 = time.time()
result = nlargest(5, enumerate(a1), itemgetter(1))
t1 = time.time()
print(result)
print(t1 - t0)
``````

## Output with Speed Reports

``````[(631934, 999999), (788104, 999998), (413003, 999997), (536514, 999996), (81029, 999995)]
0.1349949836730957
``````

``````[(631934, 999999), (788104, 999998), (413003, 999997), (536514, 999996), (81029, 999995)]
0.011161565780639648
``````

``````[(631934, 999999), (788104, 999998), (413003, 999997), (536514, 999996), (81029, 999995)]
0.439760684967041
``````

Here’s a more complicated way that increases n if the nth value has ties:

``````>>>> def get_top_n_plus_ties(arr,n):
>>>>     sorted_args = np.argsort(-arr)
>>>>     thresh = arr[sorted_args[n]]
>>>>     n_ = np.sum(arr >= thresh)
>>>>     return sorted_args[:n_]
>>>> get_top_n_plus_ties(np.array([2,9,8,3,0,2,8,3,1,9,5]),3)
array([1, 9, 2, 6])
``````

When top_k<<axis_length,it better than argsort.

``````import numpy as np

def get_sorted_top_k(array, top_k=1, axis=-1, reverse=False):
if reverse:
axis_length = array.shape[axis]
partition_index = np.take(np.argpartition(array, kth=-top_k, axis=axis),
range(axis_length - top_k, axis_length), axis)
else:
partition_index = np.take(np.argpartition(array, kth=top_k, axis=axis), range(0, top_k), axis)
top_scores = np.take_along_axis(array, partition_index, axis)
# resort partition
sorted_index = np.argsort(top_scores, axis=axis)
if reverse:
sorted_index = np.flip(sorted_index, axis=axis)
top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
top_sorted_indexes = np.take_along_axis(partition_index, sorted_index, axis)

if __name__ == "__main__":
import time
from sklearn.metrics.pairwise import cosine_similarity

x = np.random.rand(10, 128)
y = np.random.rand(1000000, 128)
z = cosine_similarity(x, y)
start_time = time.time()
sorted_index_1 = get_sorted_top_k(z, top_k=3, axis=1, reverse=True)[1]
print(time.time() - start_time)
``````

You can simply use a dictionary to find top k values & indices in a numpy array.
For example, if you want to find top 2 maximum values & indices

``````import numpy as np
nums = np.array([0.2, 0.3, 0.25, 0.15, 0.1])

def TopK(x, k):
a = dict([(i, j) for i, j in enumerate(x)])
sorted_a = dict(sorted(a.items(), key = lambda kv:kv[1], reverse=True))
indices = list(sorted_a.keys())[:k]
values = list(sorted_a.values())[:k]
return (indices, values)

print(f"Indices: {TopK(nums, k = 2)[0]}")
print(f"Values: {TopK(nums, k = 2)[1]}")

Indices: [1, 2]
Values: [0.3, 0.25]
``````

A vectorized 2D implementation using argpartition:

``````k = 3
probas = np.array([
[.6, .1, .15, .15],
[.1, .6, .15, .15],
[.3, .1, .6, 0],
])

k_indices = np.argpartition(-probas, k-1, axis=-1)[:, :k]

# adjust indices to apply in flat array

k_values = probas.flatten()[k_indices_flat]

# k_indices:
# array([[0, 2, 3],
#        [1, 2, 3],
#        [2, 0, 1]])
# k_values:
# array([[0.6 , 0.15, 0.15],
#        [0.6 , 0.15, 0.15],
#       [0.6 , 0.3 , 0.1 ]])
``````

If you are dealing with NaNs and/or have problems understanding np.argpartition, try pandas.DataFrame.sort_values.

``````import numpy as np
import pandas as pd

a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])

df = pd.DataFrame(a, columns=['array'])
max_values = df['array'].sort_values(ascending=False, na_position='last')
ind = max_values[0:3].index.to_list()
``````

This example gives the indices of the 3 largest, not-NaN values. Probably inefficient, but easy to read and customize.

Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.