How to apply numpy random.choice to a matrix of probability values (Vectorized solution)

Question:

The problem I have is as follows

I have a 1-D list of integers (or np.array) with 3 values

l = [0,1,2]

I have a 2-D list of probabilities (for simplicity, we’ll use two rows)

P = 
[[0.8, 0.1, 0.1],
 [0.3, 0.3, 0.4]]

What I want is numpy.random.choice(a=l, p=P), where each row in P (probability distribution) is applied to l. So, I want a random sample to be drawn from [0,1,2] with prob. dist. [0.8, 0.1, 0.1] first, then with prob. dist. [0.3, 0.3, 0.4] next, to give me two outputs.

===== Update ======

I can use for loops or list comprehension, but I am looking for a fast/vectorized solution.

Asked By: max_max_mir

||

Answers:

Here’s one way.

Here’s the array of probabilities:

In [161]: p
Out[161]: 
array([[ 0.8 ,  0.1 ,  0.1 ],
       [ 0.3 ,  0.3 ,  0.4 ],
       [ 0.25,  0.5 ,  0.25]])

c holds the cumulative distributions:

In [162]: c = p.cumsum(axis=1)

Generate a set of uniformly distributed samples…

In [163]: u = np.random.rand(len(c), 1)

…and then see where they “fit” in c:

In [164]: choices = (u < c).argmax(axis=1)

In [165]: choices
Out[165]: array([1, 2, 2])
Answered By: Warren Weckesser

This question is quite old, but there might be a slightly more elegant solution based on this:
https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.multinomial.html

(I adapted the original input to work as a DataFrame).

# Define the list of choices
choices = ["a", "b", "c"]

# Define the DataFrame of probability distributions
# (In each row, the probabilities of a, b and c can be different)
df_probabilities = pd.DataFrame(data=[[0.8, 0.1, 0.1],
                                      [0.3, 0.3, 0.4]],
                                columns=choices)
print(df)
     a    b    c
0  0.8  0.1  0.1
1  0.3  0.3  0.4

# Generate a DataFrame of selections. In each row, a 1 denotes
# which choice was selected
rng = np.random.default_rng(42)
df_selections = pd.DataFrame(
    data=rng.multinomial(n=1, pvals=df_probabilities),
    columns=choices)

print(df_selections)
   a  b  c
0  1  0  0
1  0  1  0

# Finally, reduce the DataFrame to one column (actually pd.Series)
# with the selected choice
df_result = df_selections.idxmax(axis=1)
print(df_result)
0    a
1    b
dtype: object
Answered By: Azrael_DD
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.