Most efficient way to convert list of values to probability distribution?

Question

I have several lists that can only contain the following values: 0, 0.5, 1, 1.5

I want to efficiently convert each of these lists into probability mass functions. So if a list is as follows: [0.5, 0.5, 1, 1.5], the PMF will look like this: [0, 0.5, 0.25, 0.25].

I need to do this many times (and with very large lists), so avoiding looping will be optimal, if at all possible. What’s the most efficient way to make this happen?

Edit: Here’s my current system. This feels like a really inefficient/unelegant way to do it:

def get_distribution(samplemodes1):
    
    n, bin_edges = np.histogram(samplemodes1, bins = 9)
    totalcount = np.sum(n)
    bin_probability = n / totalcount
    bins_per_point = np.fmin(np.digitize(samplemodes1, bin_edges), len(bin_edges)-1)
    probability_perpoint = [bin_probability[bins_per_point[i]-1] for i in range(len(samplemodes1))] 
    
    counts = Counter(samplemodes1)
    total = sum(counts.values())
    
    probability_mass = {k:v/total for k,v in counts.items()}
    #print(probability_mass)
    
    key_values = {}
    
    if(0 in probability_mass):
        key_values[0] = probability_mass.get(0)
    else:
        key_values[0] = 0
    if(0.5 in probability_mass):
        key_values[0.5] = probability_mass.get(0.5)
    else:
        key_values[0.5] = 0
    if(1 in probability_mass):
        key_values[1] = probability_mass.get(1)
    else:
        key_values[1] = 0
    if(1.5 in probability_mass):
        key_values[1.5] = probability_mass.get(1.5)  
    else:
        key_values[1.5] = 0
        
        
    distribution = list(key_values.values())
    return distribution

Asked By: Anthony Petruzzio

||

Source

Answer 1

Here are some solution for you to benchmark:

Using `collections.Counter`

from collections import Counter

bins = [0, 0.5, 1, 1.5]
a = [0.5, 0.5, 1.0, 0.5, 1.0, 1.5, 0.5]
denom = len(a)
counts = Counter(a)
pmf = [counts[bin]/denom for bin in Bins]

NumPy based solution

import numpy as np

bins = [0, 0.5, 1, 1.5]
a = np.array([0.5, 0.5, 1.0, 0.5, 1.0, 1.5, 0.5])
denom = len(a)
pmf = [(a == bin).sum()/denom for bin in bins]

but you can probably do better by using np.bincount() instead.

Further reading on this idea: https://thispointer.com/count-occurrences-of-a-value-in-numpy-array-in-python/

Answered By: joanis

Most efficient way to convert list of values to probability distribution?

Question:

Answers:

Using `collections.Counter`

NumPy based solution

Most efficient way to convert list of values to probability distribution?

Question:

Answers:

Using collections.Counter

NumPy based solution

Using `collections.Counter`