Is there a way to vectorize this loop?

Question:

I’m trying to simulate the results of two different dice. One die is fair (i.e. the probability of each number is 1/6), but the other isn’t.

I have a numpy array with 0’s and 1’s saying which die is used every time, 0 being the fair one and 1 the other. I’d like to compute another numpy array with the results. In order to do this task, I have used the following code:

def dice_simulator(dices : np.ndarray) -> np.ndarray:
  n = len(dices)
  results = np.zeros(n)
  i = 0
  for dice in np.nditer(dices):
    if dice:
      results[i] = rnd.choice(6, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]) + 1
    else:
      results[i] = rnd.choice(6) + 1
    i += 1
  return results

This takes a lot of time compared to the rest of the program, and think it is because I’m iterating over a numpy array instead of using vectorization of operations. Can anyone help me with that?

Answers:

Try this:

def dice_simulator(dices):
    p = [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4]
    size = dices.shape
    fair_die = np.random.choice(6, size=size)
    unfair_die = np.random.choice(6, p=p, size=size)
    return (dices == 0) * fair_die + (dices == 1) * unfair_die + 1
Answered By: Riccardo Bucco

this is the correct way to do it.

def dice_simulator(dices: np.array) -> np.array:
    return np.where(
        dices,
        rnd.choice(6, dices.shape, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]),
        rnd.choice(6, dices.shape)
    ) + 1

Edit: as noted by the other answers this answer generates both random arrays at full size which may be wasteful, one method you can avoid any overgenerating is based on @Claudio answer but with zero overgeneration is as following.

def dice_simulator_slices_improved(dices):
    if dices.dtype != bool:
        dices = dices.astype(bool) # because we will iterate over it 3 times.
    N = dices.shape[0]
    n_Ones  = np.count_nonzero(dices)
    n_zeros = N - n_Ones
    results = np.empty(dices.shape[0],dtype=float) # reserve output array
    results[np.logical_not(dices)] = np.random.choice([1,2,3,4,5,6], size=n_zeros)
    results[dices] = np.random.choice(
        [1,2,3,4,5,6], size=n_Ones, p=[1/12,1/12,1/12,1/4,1/4,1/4])
    return results

this is typically the fastest way to do it with zero overgeneration, now the difference between np.where and this non-overgenerating method depends on both arrays used in filling it, if their calculation is trivial, like inserting 0 and 1, then np.where is almost 5 times faster, because it only iterates over dices once, but if the generation is as expensive as np.random.choice with p parameter, which happens to be very expensive, then no overgeneration is the way to go.

Answered By: Ahmed AEK

Answers already given vectorize by over generating and throwing up some outputs, it seems wrong.

Moreover, I will generalize to any number of dices.

First, you need to be able to get a condlist: it is a list of length the number of dices, with each i-th element being a boolean array containing True where the i-th dice should be used:

dices_idxs = np.array([0, 1, 2])
dices_sequence = np.array([0, 1, 2, 2, 1, 1, 0])

condlist = np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))

print(condlist)

# [[ True False False False False False  True]
#  [False  True False False  True  True False]
#  [False False  True  True False False False]]

Second, you can generalize the answer given by @Ahmed AEK using np.select:

def dice_simulator_select(dices_sequence, dices_weights):
    faces = np.arange(1, 7)
    num_dices = len(dices_weights)
    dices_idxs = np.arange(num_dices)
    num_throws = len(dices_sequence)

    condlist = list(
        np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
    )
    choicelist = [
        RNG.choice(faces, size=num_throws, p=dices_weights[dice_idx])
        for dice_idx in range(num_dices)
    ]
    return np.select(condlist, choicelist)

But it has the issue stated first as it over-generates then discards some generated values, which can be problematic considering randomness.

A more correct way is to use np.piecewise:

def dice_simulator_piecewise(dices_sequence, dices_weights):
    faces = np.arange(1, 7)
    num_dices = len(dices_weights)
    dices_idxs = np.arange(num_dices)
    num_dices = len(dices_weights)

    condlist = list(
        np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
    )
    # note size=len(x) ensure no more sample than needed are generated
    funclist = [
        lambda x: RNG.choice(faces, size=len(x), p=dices_weights[int(x[0])])
    ] * num_dices


    return np.piecewise(dices_sequence, condlist, funclist)

You can use the functions as follows, and see that the correct function using np.piecewise is even faster (20% faster in below case):

RNG = np.random.default_rng()

dices_weights = [
    None,  # uniform
    [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
    None,
    [1 / 4, 1 / 4, 1 / 4, 1 / 12, 1 / 12, 1 / 12],
    None,
    [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
]
num_dices = len(dices_weights)
num_throws = 1_000
dices_sequence = RNG.choice(np.arange(num_dices), size=num_throws)


%timeit dice_simulator_select(dices_sequence, dices_weights)
%timeit dice_simulator_piecewise(dices_sequence, dices_weights)

# 311 µs ± 5.94 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 240 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Answered By: paime

This is the fastest way and because speed matters (this was the issue leading to the question) therefore also so far the best solution:

def dice_simulator_slices(dices):
    results  = RNG.integers(1,high=6,endpoint=True,size=dices.shape[0])
    results[dices==1] = RNG.choice([1,2,3,4,5,6], 
        size=get_size(dices), p=[1/12,1/12,1/12,1/4,1/4,1/4])
    return results

Here the required imports for the function above:

import numpy as np
RNG = np.random.default_rng()
get_size = np.count_nonzero

Now let’s compare the timings of the other solutions to this above:

dice_simulator_piecewise  SIZE = 100_000_000 : 5.483888
dice_simulator_add_arrays SIZE = 100_000_000 : 5.148283
dice_simulator_np_where   SIZE = 100_000_000 : 4.838409
dice_simulator_slices_gen SIZE = 100_000_000 : 3.437379
dice_simulator_slices     SIZE = 100_000_000 : 2.976977

Maybe surprizing in the above results is that by optimizing by not
over generating slows things down, so over generating need not
to be wrong
.

The current state of my own knowledge is that (as stated by Ahmed AEK
in his answer)
the calculation of random choices in case a weight is
not None (notice that in numpy the weight parameter is called ‘p’, not
‘weight’)
is the main speed bottleneck.


A bit slower, but still faster than other proposed solutions (see the
timings above) is my ‘generic slice’ solution supporting any number of
dices like the ‘piecewise’ solution does:

def dice_simulator_slices_gen(arr_dice_nums, arr_dice_num_weight):
    faces = np.arange(1, 7)
    results = np.empty(arr_dice_nums.shape[0], dtype=np.int8)
    for dice_num, weight in enumerate(arr_dice_num_weight): 
        bln_slice = arr_dice_nums == dice_num
        no_throws = np.count_nonzero(bln_slice)
        if weight is None: 
            results[bln_slice]=RNG.integers(1,high=6,endpoint=True,size=no_throws)
        else: 
            results[bln_slice]=RNG.choice(faces,p=weight,size=no_throws)
    return results
Answered By: Claudio
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.