Find the percentile of a value
Question:
I have an array of values like [1,2,3,4,5]
and I need to find the percentile of each value. The output I am expecting is something like [0,25,50,75,100]
.
I searched for an API in numpy that could get the desired result and found np.percentile
but it does the opposite. Given a percentile value, it will find a value using the input list as the distribution.
Is there an api or way to get this? Thanks
Answers:
You should use a list comprehension
by dividing each of the list value to the max(lst) -1
lst = [1,2,3,4,5]
max_val = max(lst) -1
lst = [(elem-1)/max_val * 100 for elem in lst]
print(lst)
Output
[0.0, 25.0, 50.0, 75.0, 100.0]
You can also achieve this using numpy
arrays.
arr = np.array([1,2,3,4,5])
result = (arr - 1) / (np.max(arr) - 1) * 100
With offset to get 0 for 1 value, compute the max, subtract one, do the same for other values, compute percentage in list comprehension:
lst = [1,2,3,4,5]
maxval = max(lst)-1
newlst = [(v-1)*100/maxval for v in lst]
print(newlst)
result (as float, if you need integer use //
for the division)
[0.0, 25.0, 50.0, 75.0, 100.0]
If your input can contain arbitrary numbers (e. g. [3, 7, 13, 20]
) which are to be mapped to 0% – 100%, then you need to figure out the minimum number and the maximum number and stretch your values to 0 … 100:
values = [ 3, 7, 13, 20 ]
min_value = min(values)
max_value = max(values)
for value in values:
fraction = float(value - min_value) / (max_value - min_value)
percentage = fraction * 100
print(value, percentage)
Or as a comprehension:
percentiles = [ float(value - min_value) / (max_value - min_value) * 100
for value in values ]
This can also be sped up using numpy
for large inputs:
import numpy as np
values = np.array([ 3, 7, 13, 20 ])
min_value = values.min()
max_value = values.max()
percentiles = (values - min_value) / (max_value - min_value) * 100
I take the definition of percentile (from wikipedia) as
One definition of percentile, often given in texts, is that the P-th percentile ( 0 < P ≤ 100 ) of a list of N ordered values (sorted from least to greatest) is the smallest value in the list such that no more than P percent of the data is strictly less than the value and at least P percent of the data is less than or equal to that value.
So, for your data the answer is:
[20,40,60,80,100]
I also assume that you don’t have a uniform distribution and number can repeat. You can get a dictionary to lookup the results using:
nbr = [1,1,3,4,5]
sorted_nbr = sorted(nbr)
ans = {x: 100*(1+i)/len(sorted_nbr) for i,x in enumerate(sorted_nbr)}
This yield:
{1: 40.0, 3: 60.0, 4: 80.0, 5: 100.0}
And if you need the list, then use:
[ans[x] for x in nbr]
To get a value’s percentile within a given dataset use scipy’s percentileofscore.
from scipy.stats import percentileofscore
dataset = [1,2,3,4,5]
percentile_of_3 = percentileofscore(dataset, 3)
print(percentile_of_3)
[Output] 60.0
This output means that 60% of the values in the dataset are less than or equal to 3. percentileofscore’s "kind" argument can be used to specify whether the percentile’s cutoff should be inclusive or exclusive. For example:
percentile_of_3 = percentileofscore(dataset, 3, kind='strict')
print(percentile_of_3)
[Output] 40.0
means that 40% of the values in the dataset are less than 3.
If we want a list containing percentiles for each value, we can use list comprehension:
all_percentiles = [percentileofscore(dataset, value, kind='strict') for value in dataset]
[Output] [0.0, 20.0, 40.0, 60.0, 80.0]
(Thanks to Cobra for the edit advice!)
You should use np.true_divide
.
x = np.arange(5)
np.true_divide(x, 4)*100
[Output] array([ 0. , 25., 50. , 75., 100. ])
I have an array of values like [1,2,3,4,5]
and I need to find the percentile of each value. The output I am expecting is something like [0,25,50,75,100]
.
I searched for an API in numpy that could get the desired result and found np.percentile
but it does the opposite. Given a percentile value, it will find a value using the input list as the distribution.
Is there an api or way to get this? Thanks
You should use a list comprehension
by dividing each of the list value to the max(lst) -1
lst = [1,2,3,4,5]
max_val = max(lst) -1
lst = [(elem-1)/max_val * 100 for elem in lst]
print(lst)
Output
[0.0, 25.0, 50.0, 75.0, 100.0]
You can also achieve this using numpy
arrays.
arr = np.array([1,2,3,4,5])
result = (arr - 1) / (np.max(arr) - 1) * 100
With offset to get 0 for 1 value, compute the max, subtract one, do the same for other values, compute percentage in list comprehension:
lst = [1,2,3,4,5]
maxval = max(lst)-1
newlst = [(v-1)*100/maxval for v in lst]
print(newlst)
result (as float, if you need integer use //
for the division)
[0.0, 25.0, 50.0, 75.0, 100.0]
If your input can contain arbitrary numbers (e. g. [3, 7, 13, 20]
) which are to be mapped to 0% – 100%, then you need to figure out the minimum number and the maximum number and stretch your values to 0 … 100:
values = [ 3, 7, 13, 20 ]
min_value = min(values)
max_value = max(values)
for value in values:
fraction = float(value - min_value) / (max_value - min_value)
percentage = fraction * 100
print(value, percentage)
Or as a comprehension:
percentiles = [ float(value - min_value) / (max_value - min_value) * 100
for value in values ]
This can also be sped up using numpy
for large inputs:
import numpy as np
values = np.array([ 3, 7, 13, 20 ])
min_value = values.min()
max_value = values.max()
percentiles = (values - min_value) / (max_value - min_value) * 100
I take the definition of percentile (from wikipedia) as
One definition of percentile, often given in texts, is that the P-th percentile ( 0 < P ≤ 100 ) of a list of N ordered values (sorted from least to greatest) is the smallest value in the list such that no more than P percent of the data is strictly less than the value and at least P percent of the data is less than or equal to that value.
So, for your data the answer is:
[20,40,60,80,100]
I also assume that you don’t have a uniform distribution and number can repeat. You can get a dictionary to lookup the results using:
nbr = [1,1,3,4,5]
sorted_nbr = sorted(nbr)
ans = {x: 100*(1+i)/len(sorted_nbr) for i,x in enumerate(sorted_nbr)}
This yield:
{1: 40.0, 3: 60.0, 4: 80.0, 5: 100.0}
And if you need the list, then use:
[ans[x] for x in nbr]
To get a value’s percentile within a given dataset use scipy’s percentileofscore.
from scipy.stats import percentileofscore
dataset = [1,2,3,4,5]
percentile_of_3 = percentileofscore(dataset, 3)
print(percentile_of_3)
[Output] 60.0
This output means that 60% of the values in the dataset are less than or equal to 3. percentileofscore’s "kind" argument can be used to specify whether the percentile’s cutoff should be inclusive or exclusive. For example:
percentile_of_3 = percentileofscore(dataset, 3, kind='strict')
print(percentile_of_3)
[Output] 40.0
means that 40% of the values in the dataset are less than 3.
If we want a list containing percentiles for each value, we can use list comprehension:
all_percentiles = [percentileofscore(dataset, value, kind='strict') for value in dataset]
[Output] [0.0, 20.0, 40.0, 60.0, 80.0]
(Thanks to Cobra for the edit advice!)
You should use np.true_divide
.
x = np.arange(5)
np.true_divide(x, 4)*100
[Output] array([ 0. , 25., 50. , 75., 100. ])