How to get indices of top-K values from a numpy array

Question:

Let suppose I have probabilities from a Pytorch or Keras predictions and result is with the softmax function

from scipy.special import softmax
probs = softmax(np.random.randn(20,10),1) # 20 instances and 10 class probabilities
probs

I want to find top-5 indices from this numpy array. All I want to do is to run a loop on the results something like:

for index in top_5_indices:
    if index in result:
        print('Found')

I’ll get if my results are in top-5 results.

Pytorch has top-k function and I have seen numpy.argpartition but I have no idea how to get this done?

Asked By: Deshwal

||

Answers:

A little more expensive, but argsort would do:

idx = np.argsort(probs, axis=1)[:,-5:]

If we are talking about pytorch:

probs = torch.from_numpy(softmax(np.random.randn(20,10),1))

values, idx = torch.topk(probs, k=5, axis=-1)
Answered By: Quang Hoang

argpartition(a, k) function in numpy rearranges indices of input array a around the kth smallest element, so that all indices of smaller elements end up to the left, and all indices of bigger elements end up to the right. Not needing to sort all elements saves time: argpartition takes O(n) time, while argsort takes O(n log n) time.

So you can get the indices of 5 biggest elements like this:

np.argpartition(probs,-5)[-5:]
Answered By: Viktoriya Malyasova

The existing answers are correct, but I wanted to expand on them to provide a self-contained function that behaves exactly like torch.topk with pure numpy.

Here’s the function (I’ve included the instructions inline):

def topk(array, k, axis=-1, sorted=True):
    # Use np.argpartition is faster than np.argsort, but do not return the values in order
    # We use array.take because you can specify the axis
    partitioned_ind = (
        np.argpartition(array, -k, axis=axis)
        .take(indices=range(-k, 0), axis=axis)
    )
    # We use the newly selected indices to find the score of the top-k values
    partitioned_scores = np.take_along_axis(array, partitioned_ind, axis=axis)
    
    if sorted:
        # Since our top-k indices are not correctly ordered, we can sort them with argsort
        # only if sorted=True (otherwise we keep it in an arbitrary order)
        sorted_trunc_ind = np.flip(
            np.argsort(partitioned_scores, axis=axis), axis=axis
        )
        
        # We again use np.take_along_axis as we have an array of indices that we use to
        # decide which values to select
        ind = np.take_along_axis(partitioned_ind, sorted_trunc_ind, axis=axis)
        scores = np.take_along_axis(partitioned_scores, sorted_trunc_ind, axis=axis)
    else:
        ind = partitioned_ind
        scores = partitioned_scores
    
    return scores, ind

To verify the correctness, you can test it against torch:

import torch
import numpy as np

x = np.random.randn(50, 50, 10, 10)

axis = 2  # Change this to any axis and it'll be fine

val_np, ind_np = topk(x, k=10, axis=axis)

val_pt, ind_pt = torch.topk(torch.tensor(x), k=10, dim=axis)

print("Values are same:", np.all(val_np == val_pt.numpy()))
print("Indices are same:", np.all(ind_np == ind_pt.numpy()))
  • To be clear, np.take_along_axis is recommended to be used with np.argpartition for accessing the original value in the higher-dimension.
  • np.argpartition is faster than np.argsort because it does not sort the entire array. This answer claims it takes O(n) instead of `O(n log
Answered By: xhluca
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.