Is there a way to vectorize this loop?
Question:
I’m trying to simulate the results of two different dice. One die is fair (i.e. the probability of each number is 1/6), but the other isn’t.
I have a numpy array with 0’s and 1’s saying which die is used every time, 0 being the fair one and 1 the other. I’d like to compute another numpy array with the results. In order to do this task, I have used the following code:
def dice_simulator(dices : np.ndarray) -> np.ndarray:
n = len(dices)
results = np.zeros(n)
i = 0
for dice in np.nditer(dices):
if dice:
results[i] = rnd.choice(6, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]) + 1
else:
results[i] = rnd.choice(6) + 1
i += 1
return results
This takes a lot of time compared to the rest of the program, and think it is because I’m iterating over a numpy array instead of using vectorization of operations. Can anyone help me with that?
Answers:
Try this:
def dice_simulator(dices):
p = [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4]
size = dices.shape
fair_die = np.random.choice(6, size=size)
unfair_die = np.random.choice(6, p=p, size=size)
return (dices == 0) * fair_die + (dices == 1) * unfair_die + 1
this is the correct way to do it.
def dice_simulator(dices: np.array) -> np.array:
return np.where(
dices,
rnd.choice(6, dices.shape, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]),
rnd.choice(6, dices.shape)
) + 1
Edit: as noted by the other answers this answer generates both random arrays at full size which may be wasteful, one method you can avoid any overgenerating is based on @Claudio answer but with zero overgeneration is as following.
def dice_simulator_slices_improved(dices):
if dices.dtype != bool:
dices = dices.astype(bool) # because we will iterate over it 3 times.
N = dices.shape[0]
n_Ones = np.count_nonzero(dices)
n_zeros = N - n_Ones
results = np.empty(dices.shape[0],dtype=float) # reserve output array
results[np.logical_not(dices)] = np.random.choice([1,2,3,4,5,6], size=n_zeros)
results[dices] = np.random.choice(
[1,2,3,4,5,6], size=n_Ones, p=[1/12,1/12,1/12,1/4,1/4,1/4])
return results
this is typically the fastest way to do it with zero overgeneration, now the difference between np.where and this non-overgenerating method depends on both arrays used in filling it, if their calculation is trivial, like inserting 0 and 1, then np.where is almost 5 times faster, because it only iterates over dices
once, but if the generation is as expensive as np.random.choice
with p
parameter, which happens to be very expensive, then no overgeneration is the way to go.
Answers already given vectorize by over generating and throwing up some outputs, it seems wrong.
Moreover, I will generalize to any number of dices.
First, you need to be able to get a condlist
: it is a list of length the number of dices, with each i-th element being a boolean array containing True
where the i-th dice should be used:
dices_idxs = np.array([0, 1, 2])
dices_sequence = np.array([0, 1, 2, 2, 1, 1, 0])
condlist = np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
print(condlist)
# [[ True False False False False False True]
# [False True False False True True False]
# [False False True True False False False]]
Second, you can generalize the answer given by @Ahmed AEK using np.select
:
def dice_simulator_select(dices_sequence, dices_weights):
faces = np.arange(1, 7)
num_dices = len(dices_weights)
dices_idxs = np.arange(num_dices)
num_throws = len(dices_sequence)
condlist = list(
np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
)
choicelist = [
RNG.choice(faces, size=num_throws, p=dices_weights[dice_idx])
for dice_idx in range(num_dices)
]
return np.select(condlist, choicelist)
But it has the issue stated first as it over-generates then discards some generated values, which can be problematic considering randomness.
A more correct way is to use np.piecewise
:
def dice_simulator_piecewise(dices_sequence, dices_weights):
faces = np.arange(1, 7)
num_dices = len(dices_weights)
dices_idxs = np.arange(num_dices)
num_dices = len(dices_weights)
condlist = list(
np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
)
# note size=len(x) ensure no more sample than needed are generated
funclist = [
lambda x: RNG.choice(faces, size=len(x), p=dices_weights[int(x[0])])
] * num_dices
return np.piecewise(dices_sequence, condlist, funclist)
You can use the functions as follows, and see that the correct function using np.piecewise
is even faster (20% faster in below case):
RNG = np.random.default_rng()
dices_weights = [
None, # uniform
[1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
None,
[1 / 4, 1 / 4, 1 / 4, 1 / 12, 1 / 12, 1 / 12],
None,
[1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
]
num_dices = len(dices_weights)
num_throws = 1_000
dices_sequence = RNG.choice(np.arange(num_dices), size=num_throws)
%timeit dice_simulator_select(dices_sequence, dices_weights)
%timeit dice_simulator_piecewise(dices_sequence, dices_weights)
# 311 µs ± 5.94 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 240 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
This is the fastest way and because speed matters (this was the issue leading to the question) therefore also so far the best solution:
def dice_simulator_slices(dices):
results = RNG.integers(1,high=6,endpoint=True,size=dices.shape[0])
results[dices==1] = RNG.choice([1,2,3,4,5,6],
size=get_size(dices), p=[1/12,1/12,1/12,1/4,1/4,1/4])
return results
Here the required imports for the function above:
import numpy as np
RNG = np.random.default_rng()
get_size = np.count_nonzero
Now let’s compare the timings of the other solutions to this above:
dice_simulator_piecewise SIZE = 100_000_000 : 5.483888
dice_simulator_add_arrays SIZE = 100_000_000 : 5.148283
dice_simulator_np_where SIZE = 100_000_000 : 4.838409
dice_simulator_slices_gen SIZE = 100_000_000 : 3.437379
dice_simulator_slices SIZE = 100_000_000 : 2.976977
Maybe surprizing in the above results is that by optimizing by not
over generating slows things down, so over generating need not
to be wrong.
The current state of my own knowledge is that (as stated by Ahmed AEK
in his answer) the calculation of random choices in case a weight is
not None (notice that in numpy the weight parameter is called ‘p’, not
‘weight’) is the main speed bottleneck.
A bit slower, but still faster than other proposed solutions (see the
timings above) is my ‘generic slice’ solution supporting any number of
dices like the ‘piecewise’ solution does:
def dice_simulator_slices_gen(arr_dice_nums, arr_dice_num_weight):
faces = np.arange(1, 7)
results = np.empty(arr_dice_nums.shape[0], dtype=np.int8)
for dice_num, weight in enumerate(arr_dice_num_weight):
bln_slice = arr_dice_nums == dice_num
no_throws = np.count_nonzero(bln_slice)
if weight is None:
results[bln_slice]=RNG.integers(1,high=6,endpoint=True,size=no_throws)
else:
results[bln_slice]=RNG.choice(faces,p=weight,size=no_throws)
return results
I’m trying to simulate the results of two different dice. One die is fair (i.e. the probability of each number is 1/6), but the other isn’t.
I have a numpy array with 0’s and 1’s saying which die is used every time, 0 being the fair one and 1 the other. I’d like to compute another numpy array with the results. In order to do this task, I have used the following code:
def dice_simulator(dices : np.ndarray) -> np.ndarray:
n = len(dices)
results = np.zeros(n)
i = 0
for dice in np.nditer(dices):
if dice:
results[i] = rnd.choice(6, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]) + 1
else:
results[i] = rnd.choice(6) + 1
i += 1
return results
This takes a lot of time compared to the rest of the program, and think it is because I’m iterating over a numpy array instead of using vectorization of operations. Can anyone help me with that?
Try this:
def dice_simulator(dices):
p = [1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4]
size = dices.shape
fair_die = np.random.choice(6, size=size)
unfair_die = np.random.choice(6, p=p, size=size)
return (dices == 0) * fair_die + (dices == 1) * unfair_die + 1
this is the correct way to do it.
def dice_simulator(dices: np.array) -> np.array:
return np.where(
dices,
rnd.choice(6, dices.shape, p = [1/12, 1/12, 1/12, 1/4, 1/4, 1/4]),
rnd.choice(6, dices.shape)
) + 1
Edit: as noted by the other answers this answer generates both random arrays at full size which may be wasteful, one method you can avoid any overgenerating is based on @Claudio answer but with zero overgeneration is as following.
def dice_simulator_slices_improved(dices):
if dices.dtype != bool:
dices = dices.astype(bool) # because we will iterate over it 3 times.
N = dices.shape[0]
n_Ones = np.count_nonzero(dices)
n_zeros = N - n_Ones
results = np.empty(dices.shape[0],dtype=float) # reserve output array
results[np.logical_not(dices)] = np.random.choice([1,2,3,4,5,6], size=n_zeros)
results[dices] = np.random.choice(
[1,2,3,4,5,6], size=n_Ones, p=[1/12,1/12,1/12,1/4,1/4,1/4])
return results
this is typically the fastest way to do it with zero overgeneration, now the difference between np.where and this non-overgenerating method depends on both arrays used in filling it, if their calculation is trivial, like inserting 0 and 1, then np.where is almost 5 times faster, because it only iterates over dices
once, but if the generation is as expensive as np.random.choice
with p
parameter, which happens to be very expensive, then no overgeneration is the way to go.
Answers already given vectorize by over generating and throwing up some outputs, it seems wrong.
Moreover, I will generalize to any number of dices.
First, you need to be able to get a condlist
: it is a list of length the number of dices, with each i-th element being a boolean array containing True
where the i-th dice should be used:
dices_idxs = np.array([0, 1, 2])
dices_sequence = np.array([0, 1, 2, 2, 1, 1, 0])
condlist = np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
print(condlist)
# [[ True False False False False False True]
# [False True False False True True False]
# [False False True True False False False]]
Second, you can generalize the answer given by @Ahmed AEK using np.select
:
def dice_simulator_select(dices_sequence, dices_weights):
faces = np.arange(1, 7)
num_dices = len(dices_weights)
dices_idxs = np.arange(num_dices)
num_throws = len(dices_sequence)
condlist = list(
np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
)
choicelist = [
RNG.choice(faces, size=num_throws, p=dices_weights[dice_idx])
for dice_idx in range(num_dices)
]
return np.select(condlist, choicelist)
But it has the issue stated first as it over-generates then discards some generated values, which can be problematic considering randomness.
A more correct way is to use np.piecewise
:
def dice_simulator_piecewise(dices_sequence, dices_weights):
faces = np.arange(1, 7)
num_dices = len(dices_weights)
dices_idxs = np.arange(num_dices)
num_dices = len(dices_weights)
condlist = list(
np.equal(*np.broadcast_arrays(dices_sequence[None, :], dices_idxs[:, None]))
)
# note size=len(x) ensure no more sample than needed are generated
funclist = [
lambda x: RNG.choice(faces, size=len(x), p=dices_weights[int(x[0])])
] * num_dices
return np.piecewise(dices_sequence, condlist, funclist)
You can use the functions as follows, and see that the correct function using np.piecewise
is even faster (20% faster in below case):
RNG = np.random.default_rng()
dices_weights = [
None, # uniform
[1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
None,
[1 / 4, 1 / 4, 1 / 4, 1 / 12, 1 / 12, 1 / 12],
None,
[1 / 12, 1 / 12, 1 / 12, 1 / 4, 1 / 4, 1 / 4],
]
num_dices = len(dices_weights)
num_throws = 1_000
dices_sequence = RNG.choice(np.arange(num_dices), size=num_throws)
%timeit dice_simulator_select(dices_sequence, dices_weights)
%timeit dice_simulator_piecewise(dices_sequence, dices_weights)
# 311 µs ± 5.94 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 240 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
This is the fastest way and because speed matters (this was the issue leading to the question) therefore also so far the best solution:
def dice_simulator_slices(dices):
results = RNG.integers(1,high=6,endpoint=True,size=dices.shape[0])
results[dices==1] = RNG.choice([1,2,3,4,5,6],
size=get_size(dices), p=[1/12,1/12,1/12,1/4,1/4,1/4])
return results
Here the required imports for the function above:
import numpy as np
RNG = np.random.default_rng()
get_size = np.count_nonzero
Now let’s compare the timings of the other solutions to this above:
dice_simulator_piecewise SIZE = 100_000_000 : 5.483888
dice_simulator_add_arrays SIZE = 100_000_000 : 5.148283
dice_simulator_np_where SIZE = 100_000_000 : 4.838409
dice_simulator_slices_gen SIZE = 100_000_000 : 3.437379
dice_simulator_slices SIZE = 100_000_000 : 2.976977
Maybe surprizing in the above results is that by optimizing by not
over generating slows things down, so over generating need not
to be wrong.
The current state of my own knowledge is that (as stated by Ahmed AEK
in his answer) the calculation of random choices in case a weight is
not None (notice that in numpy the weight parameter is called ‘p’, not
‘weight’) is the main speed bottleneck.
A bit slower, but still faster than other proposed solutions (see the
timings above) is my ‘generic slice’ solution supporting any number of
dices like the ‘piecewise’ solution does:
def dice_simulator_slices_gen(arr_dice_nums, arr_dice_num_weight):
faces = np.arange(1, 7)
results = np.empty(arr_dice_nums.shape[0], dtype=np.int8)
for dice_num, weight in enumerate(arr_dice_num_weight):
bln_slice = arr_dice_nums == dice_num
no_throws = np.count_nonzero(bln_slice)
if weight is None:
results[bln_slice]=RNG.integers(1,high=6,endpoint=True,size=no_throws)
else:
results[bln_slice]=RNG.choice(faces,p=weight,size=no_throws)
return results