Find the percentile of a value

Question:

I have an array of values like [1,2,3,4,5] and I need to find the percentile of each value. The output I am expecting is something like [0,25,50,75,100].

I searched for an API in numpy that could get the desired result and found np.percentile but it does the opposite. Given a percentile value, it will find a value using the input list as the distribution.

Is there an api or way to get this? Thanks

Asked By: Clock Slave

||

Answers:

You should use a list comprehension by dividing each of the list value to the max(lst) -1

lst = [1,2,3,4,5]
max_val = max(lst) -1
lst = [(elem-1)/max_val * 100 for elem in lst]
print(lst)

Output

[0.0, 25.0, 50.0, 75.0, 100.0]

You can also achieve this using numpy arrays.

arr = np.array([1,2,3,4,5])
result = (arr - 1) / (np.max(arr) - 1) * 100

With offset to get 0 for 1 value, compute the max, subtract one, do the same for other values, compute percentage in list comprehension:

lst = [1,2,3,4,5]
maxval = max(lst)-1
newlst = [(v-1)*100/maxval for v in lst]

print(newlst)

result (as float, if you need integer use // for the division)

[0.0, 25.0, 50.0, 75.0, 100.0]

If your input can contain arbitrary numbers (e. g. [3, 7, 13, 20]) which are to be mapped to 0% – 100%, then you need to figure out the minimum number and the maximum number and stretch your values to 0 … 100:

values = [ 3, 7, 13, 20 ]
min_value = min(values)
max_value = max(values)
for value in values:
  fraction = float(value - min_value) / (max_value - min_value)
  percentage = fraction * 100
  print(value, percentage)

Or as a comprehension:

percentiles = [ float(value - min_value) / (max_value - min_value) * 100
                for value in values ]

This can also be sped up using numpy for large inputs:

import numpy as np

values = np.array([ 3, 7, 13, 20 ])
min_value = values.min()
max_value = values.max()
percentiles = (values - min_value) / (max_value - min_value) * 100
Answered By: Alfe

I take the definition of percentile (from wikipedia) as

One definition of percentile, often given in texts, is that the P-th percentile ( 0 < P ≤ 100 ) of a list of N ordered values (sorted from least to greatest) is the smallest value in the list such that no more than P percent of the data is strictly less than the value and at least P percent of the data is less than or equal to that value.

So, for your data the answer is:

[20,40,60,80,100]

I also assume that you don’t have a uniform distribution and number can repeat. You can get a dictionary to lookup the results using:

nbr = [1,1,3,4,5]
sorted_nbr = sorted(nbr)
ans = {x: 100*(1+i)/len(sorted_nbr) for i,x in enumerate(sorted_nbr)}

This yield:

{1: 40.0, 3: 60.0, 4: 80.0, 5: 100.0}

And if you need the list, then use:

[ans[x] for x in nbr]
Answered By: Eolmar

To get a value’s percentile within a given dataset use scipy’s percentileofscore.

from scipy.stats import percentileofscore

dataset = [1,2,3,4,5]

percentile_of_3 = percentileofscore(dataset, 3)
print(percentile_of_3)

[Output] 60.0

This output means that 60% of the values in the dataset are less than or equal to 3. percentileofscore’s "kind" argument can be used to specify whether the percentile’s cutoff should be inclusive or exclusive. For example:

percentile_of_3 = percentileofscore(dataset, 3, kind='strict')
print(percentile_of_3)

[Output] 40.0

means that 40% of the values in the dataset are less than 3.

If we want a list containing percentiles for each value, we can use list comprehension:

all_percentiles = [percentileofscore(dataset, value, kind='strict') for value in dataset]

[Output] [0.0, 20.0, 40.0, 60.0, 80.0]

(Thanks to Cobra for the edit advice!)

Answered By: kidbilly

You should use np.true_divide.

x = np.arange(5)
np.true_divide(x, 4)*100
[Output] array([ 0.  ,  25.,  50. ,  75.,  100.  ])
Answered By: Clerk
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.