Is there a way to vectorize this loop?

Question

I’m trying to simulate the results of two different dice. One die is fair (i.e. the probability of each number is 1/6), but the other isn’t.

I have a numpy array with 0’s and 1’s saying which die is used every time, 0 being the fair one and 1 the other. I’d like to compute another numpy array with the results. In order to do this task, I have used the following code:

def dice_simulator(dices : np.ndarray) -> np.ndarray:
  n = len(dices)
  results = np.zeros(n)
  i = 0
  for dice in np.nditer(dices):
    if dice:
      results[i] = rnd.choice(6, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]) + 1
    else:
      results[i] = rnd.choice(6) + 1
    i += 1
  return results

This takes a lot of time compared to the rest of the program, and think it is because I’m iterating over a numpy array instead of using vectorization of operations. Can anyone help me with that?

Asked By: Jonathan Rodríguez Barja

||

Source

Answer 1

Try this:

def dice_simulator(dices):
    p = [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4]
    size = dices.shape
    fair_die = np.random.choice(6, size=size)
    unfair_die = np.random.choice(6, p=p, size=size)
    return (dices == 0) * fair_die + (dices == 1) * unfair_die + 1

Answered By: Riccardo Bucco

Answer 2

this is the correct way to do it.

def dice_simulator(dices: np.array) -> np.array:
    return np.where(
        dices,
        rnd.choice(6, dices.shape, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]),
        rnd.choice(6, dices.shape)
    ) + 1

Edit: as noted by the other answers this answer generates both random arrays at full size which may be wasteful, one method you can avoid any overgenerating is based on @Claudio answer but with zero overgeneration is as following.

def dice_simulator_slices_improved(dices):
    if dices.dtype != bool:
        dices = dices.astype(bool) # because we will iterate over it 3 times.
    N = dices.shape[0]
    n_Ones  = np.count_nonzero(dices)
    n_zeros = N - n_Ones
    results = np.empty(dices.shape[0],dtype=float) # reserve output array
    results[np.logical_not(dices)] = np.random.choice([1,2,3,4,5,6], size=n_zeros)
    results[dices] = np.random.choice(
        [1,2,3,4,5,6], size=n_Ones, p=[1/12,1/12,1/12,1/4,1/4,1/4])
    return results

this is typically the fastest way to do it with zero overgeneration, now the difference between np.where and this non-overgenerating method depends on both arrays used in filling it, if their calculation is trivial, like inserting 0 and 1, then np.where is almost 5 times faster, because it only iterates over dices once, but if the generation is as expensive as np.random.choice with p parameter, which happens to be very expensive, then no overgeneration is the way to go.

Answered By: Ahmed AEK

Answer 3

Answers already given vectorize by over generating and throwing up some outputs, it seems wrong.

Moreover, I will generalize to any number of dices.

First, you need to be able to get a condlist: it is a list of length the number of dices, with each i-th element being a boolean array containing True where the i-th dice should be used:

dices_idxs = np.array([0, 1, 2])
dices_sequence = np.array([0, 1, 2, 2, 1, 1, 0])

condlist = np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))

print(condlist)

# [[ True False False False False False  True]
#  [False  True False False  True  True False]
#  [False False  True  True False False False]]

Second, you can generalize the answer given by @Ahmed AEK using np.select:

def dice_simulator_select(dices_sequence, dices_weights):
    faces = np.arange(1, 7)
    num_dices = len(dices_weights)
    dices_idxs = np.arange(num_dices)
    num_throws = len(dices_sequence)

    condlist = list(
        np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
    )
    choicelist = [
        RNG.choice(faces, size=num_throws, p=dices_weights[dice_idx])
        for dice_idx in range(num_dices)
    ]
    return np.select(condlist, choicelist)

But it has the issue stated first as it over-generates then discards some generated values, which can be problematic considering randomness.

A more correct way is to use np.piecewise:

def dice_simulator_piecewise(dices_sequence, dices_weights):
    faces = np.arange(1, 7)
    num_dices = len(dices_weights)
    dices_idxs = np.arange(num_dices)
    num_dices = len(dices_weights)

    condlist = list(
        np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
    )
    # note size=len(x) ensure no more sample than needed are generated
    funclist = [
        lambda x: RNG.choice(faces, size=len(x), p=dices_weights[int(x[0])])
    ] * num_dices


    return np.piecewise(dices_sequence, condlist, funclist)

You can use the functions as follows, and see that the correct function using np.piecewise is even faster (20% faster in below case):

RNG = np.random.default_rng()

dices_weights = [
    None,  # uniform
    [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
    None,
    [1 / 4, 1 / 4, 1 / 4, 1 / 12, 1 / 12, 1 / 12],
    None,
    [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
]
num_dices = len(dices_weights)
num_throws = 1_000
dices_sequence = RNG.choice(np.arange(num_dices), size=num_throws)


%timeit dice_simulator_select(dices_sequence, dices_weights)
%timeit dice_simulator_piecewise(dices_sequence, dices_weights)

# 311 µs ± 5.94 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 240 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Answered By: paime

Answer 4

This is the fastest way and because speed matters (this was the issue leading to the question) therefore also so far the best solution:

def dice_simulator_slices(dices):
    results  = RNG.integers(1,high=6,endpoint=True,size=dices.shape[0])
    results[dices==1] = RNG.choice([1,2,3,4,5,6], 
        size=get_size(dices), p=[1/12,1/12,1/12,1/4,1/4,1/4])
    return results

Here the required imports for the function above:

import numpy as np
RNG = np.random.default_rng()
get_size = np.count_nonzero

Now let’s compare the timings of the other solutions to this above:

dice_simulator_piecewise  SIZE = 100_000_000 : 5.483888
dice_simulator_add_arrays SIZE = 100_000_000 : 5.148283
dice_simulator_np_where   SIZE = 100_000_000 : 4.838409
dice_simulator_slices_gen SIZE = 100_000_000 : 3.437379
dice_simulator_slices     SIZE = 100_000_000 : 2.976977

Maybe surprizing in the above results is that by optimizing by not
over generating slows things down, so over generating need not
to be wrong.

The current state of my own knowledge is that (as stated by Ahmed AEK
in his answer) the calculation of random choices in case a weight is
not None (notice that in numpy the weight parameter is called ‘p’, not
‘weight’) is the main speed bottleneck.

A bit slower, but still faster than other proposed solutions (see the
timings above) is my ‘generic slice’ solution supporting any number of
dices like the ‘piecewise’ solution does:

def dice_simulator_slices_gen(arr_dice_nums, arr_dice_num_weight):
    faces = np.arange(1, 7)
    results = np.empty(arr_dice_nums.shape[0], dtype=np.int8)
    for dice_num, weight in enumerate(arr_dice_num_weight): 
        bln_slice = arr_dice_nums == dice_num
        no_throws = np.count_nonzero(bln_slice)
        if weight is None: 
            results[bln_slice]=RNG.integers(1,high=6,endpoint=True,size=no_throws)
        else: 
            results[bln_slice]=RNG.choice(faces,p=weight,size=no_throws)
    return results

Answered By: Claudio

Is there a way to vectorize this loop?

Question:

Answers: