# Most efficient way to convert list of values to probability distribution?

## Question:

I have several lists that can only contain the following values: 0, 0.5, 1, 1.5

I want to efficiently convert each of these lists into probability mass functions. So if a list is as follows: [0.5, 0.5, 1, 1.5], the PMF will look like this: [0, 0.5, 0.25, 0.25].

I need to do this many times (and with very large lists), so avoiding looping will be optimal, if at all possible. What’s the most efficient way to make this happen?

Edit: Here’s my current system. This feels like a really inefficient/unelegant way to do it:

``````def get_distribution(samplemodes1):

n, bin_edges = np.histogram(samplemodes1, bins = 9)
totalcount = np.sum(n)
bin_probability = n / totalcount
bins_per_point = np.fmin(np.digitize(samplemodes1, bin_edges), len(bin_edges)-1)
probability_perpoint = [bin_probability[bins_per_point[i]-1] for i in range(len(samplemodes1))]

counts = Counter(samplemodes1)
total = sum(counts.values())

probability_mass = {k:v/total for k,v in counts.items()}
#print(probability_mass)

key_values = {}

if(0 in probability_mass):
key_values[0] = probability_mass.get(0)
else:
key_values[0] = 0
if(0.5 in probability_mass):
key_values[0.5] = probability_mass.get(0.5)
else:
key_values[0.5] = 0
if(1 in probability_mass):
key_values[1] = probability_mass.get(1)
else:
key_values[1] = 0
if(1.5 in probability_mass):
key_values[1.5] = probability_mass.get(1.5)
else:
key_values[1.5] = 0

distribution = list(key_values.values())
return distribution

``````

Here are some solution for you to benchmark:

#### Using `collections.Counter`

``````from collections import Counter

bins = [0, 0.5, 1, 1.5]
a = [0.5, 0.5, 1.0, 0.5, 1.0, 1.5, 0.5]
denom = len(a)
counts = Counter(a)
pmf = [counts[bin]/denom for bin in Bins]
``````

#### NumPy based solution

``````import numpy as np

bins = [0, 0.5, 1, 1.5]
a = np.array([0.5, 0.5, 1.0, 0.5, 1.0, 1.5, 0.5])
denom = len(a)
pmf = [(a == bin).sum()/denom for bin in bins]
``````

but you can probably do better by using `np.bincount()` instead.

Further reading on this idea: https://thispointer.com/count-occurrences-of-a-value-in-numpy-array-in-python/

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.