Cryptographically-secure, exactly-weighted sampling

Question:

How do I choose k elements with replacement and weights under the following conditions?

  • Randomness must be cryptographically-secure, e.g. as used in the secrets module.
  • Weighting must be exact, i.e. use integral instead of floating-point arithmetic.

Self-authored code is likely to be less secure and efficient than available implementations. To my best understanding, the following implementations don’t meet my requirements.

Asked By: enabtay0s9ex8dyq

||

Answers:

I would just rip apart the choices implemention from the random module. Something like:

from random import SystemRandom
from itertools import accumulate as _accumulate, repeat as _repeat
from bisect import bisect as _bisect

def choices(population, weights, *, k=1):
    randrange = SystemRandom().randrange
    n = len(population)
    cum_weights = list(_accumulate(weights))
    if len(cum_weights) != n:
        raise ValueError('The number of weights does not match the population')
    total = cum_weights[-1]
    if not isinstance(total, int):
        raise ValueError('Weights must be integer values')
    if total <= 0:
        raise ValueError('Total of weights must be greater than zero')
    bisect = _bisect
    hi = n - 1
    return [population[bisect(cum_weights, randrange(total), 0, hi)]
            for i in _repeat(None, k)]

which could be tested as:

from collections import Counter

draws = choices([1, 2, 3], [1, 2, 3], k=1_000_000)
print(dict(sorted(Counter(draws).items())))

giving me:

{1: 166150, 2: 333614, 3: 500236}

which looks about right.

Update: just thought to check for off-by-one errors and it seems good here:

print(
    choices([1, 2, 3], [1, 0, 0], k=5),
    choices([1, 2, 3], [0, 1, 0], k=5),
    choices([1, 2, 3], [0, 0, 1], k=5),
)

giving:

[1, 1, 1, 1, 1] [2, 2, 2, 2, 2] [3, 3, 3, 3, 3]

which also seems right.

Answered By: Sam Mason
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.