sample using negated normal via numpy


I have a sorted list that I need to sample from. I want to favor the items towards each end of the list. In other words, I want to sample from the list using a negated normal function such that the first and last entries of the list are chosen more frequently than items in the middle of the list. I tried this:

slots = np.floor(np.random.normal(scale=len(children)//2, size=max_children)) - max_children//2
return children[slots]

However, it returns numbers that are out of range. It also returns duplicate numbers. What can I do better?

Asked By: Brannon



As your are working with a list of discrete values I would argue you would rather work with a multinomial distribution of the list indices. In Numpy this can be done conveniently with the np.random.choice method, which directly takes the probabilities associated with each entry. Here is a minimal example:

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

random_state = np.random.RandomState(4873)

children = np.arange(20)

p = norm.pdf(np.arange(len(children)), loc=(len(children) - 1) / 2, scale=10)
p = p.max() - p + 0.01 # to prevent samples in the middle end up with zero probability
samples = random_state.choice(children, p=p / p.sum(), size=10_000)

fig, axes = plt.subplots(1, 2, figsize=(10, 5))

axes[0].bar(children, p)
axes[0].set_title("Probability mass function")

axes[1].hist(samples, bins=np.arange(len(children) + 1) - 0.5, density=True, alpha=0.5)

Which plots:
enter image description here

I also think random.choice also makes the intention of drawing sampled from the list much more clear.

However the point of possibly using a beta distribution is still valid. In this case you would still convert it to a probability mass function as shown above.

I hope this helps!

Answered By: Axel Donath
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.