Numpy function to get the quantile that corresponds to a given value

Question:

I see a lot of questions like this one for R, but I couldn’t find one specifically for Python, preferably using numpy.

Let’s say I have an array of observations stored in x. I can get the value that accumulates q * 100 per cent of the population.

# Import numpy
import numpy as np

# Get 75th percentile
np.quantile(a=x, q=0.75)

However, I was wondering if there’s a function that does the inverse. That is, a numpy function that takes a value as an input and returns q.

To further expand on this, scipy distribution objects have a ppf method that allows me to do this. I’m looking for something similar in numpy. Does it exist?

Asked By: Arturo Sbr

||

Answers:

If x is sorted, the value at index i is the i / len(x) percentile (or so, depending on how you want to treat boundary conditions). If x is not sorted, you can obtain the same value by substituting x.argsort().argsort()[i] for i (or just sorting x first). Since argsort is its own inverse, the double argsort tells you where each element of the original would fall in the sorted array.

If you want to find the result for arbitrary values not necessarily in x, you can apply np.searchsorted to a sorted version of x and interpolating on the result. You can use a more complicated method, like fitting a spline to the sorted data or something similar.

Answered By: Mad Physicist

There’s a convenience function that does this. Note that it’s not an exact inverse because the quantile/percentile functions are not exact. Given a finite array of observations, the percentiles will have discrete values; in other words, you may be specifying a q that falls between those values and the functions find the closest one.

from scipy import stats
import numpy as np

stats.percentileofscore(np.arange(0,1,0.12), .65, 'weak') / 100
Answered By: BatWannaBe

Not a ready-made function but a compact and reasonably fast snippet:

(a<value).mean()

You can (at least on my machine) squeeze out a few percent better performance by using np.count_nonzero

np.count_nonzero(a<value) / a.size

but tbh I wouldn’t even bother.

Answered By: loopy walt

While vals = x.argsort().argsort()/(x.size-1) works in arrays with fully unique values, it fails if you have repeated values. Identical values should have the same quantile value, but for example, if the array x had 200 values of zeros and 800 values larger than zero, then this method would give 200 different quantile values to those zero values. Safer to use
vals = np.array([np.count_nonzero(x<x_i)/(x.size-1) for x_i in x]),
since identical values get identical quantile positions then.

import numpy as np

def get_quant(x):
  " for each value in x, return which quantile it corresponds to "
  return np.array([np.count_nonzero(x<x_i)/(len(x)-1) for x_i in x])

Note: the (x.size-1) denominators ensure the quantile values range from 0 to 1 inclusive. Leaving out the -1 means the 100% quantile is never reached.

Answered By: William Black
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.