Binary random array with a specific proportion of ones?

Question:

What is the efficient(probably vectorized with Matlab terminology) way to generate random number of zeros and ones with a specific proportion? Specially with Numpy?

As my case is special for 1/3, my code is:

import numpy as np 
a=np.mod(np.multiply(np.random.randomintegers(0,2,size)),3)

But is there any built-in function that could handle this more effeciently at least for the situation of K/N where K and N are natural numbers?

Asked By: Cupitor

||

Answers:

You can use numpy.random.binomial. E.g. suppose frac is the proportion of ones:

In [50]: frac = 0.15

In [51]: sample = np.random.binomial(1, frac, size=10000)

In [52]: sample.sum()
Out[52]: 1567
Answered By: Warren Weckesser

A simple way to do this would be to first generate an ndarray with the proportion of zeros and ones you want:

>>> import numpy as np
>>> N = 100
>>> K = 30 # K zeros, N-K ones
>>> arr = np.array([0] * K + [1] * (N-K))
>>> arr
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1])

Then you can just shuffle the array, making the distribution random:

>>> np.random.shuffle(arr)
>>> arr
array([1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0,
       1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1,
       1, 1, 1, 0, 1, 1, 1, 1])

Note that this approach will give you the exact proportion of zeros/ones you request, unlike say the binomial approach. If you don’t need the exact proportion, then the binomial approach will work just fine.

Answered By: mdml

If I understand your problem correctly, you might get some help with numpy.random.shuffle

>>> def rand_bin_array(K, N):
    arr = np.zeros(N)
    arr[:K]  = 1
    np.random.shuffle(arr)
    return arr

>>> rand_bin_array(5,15)
array([ 0.,  1.,  0.,  1.,  1.,  1.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,
        0.,  0.])
Answered By: Abhijit

Yet another approach, using np.random.choice:

>>> np.random.choice([0, 1], size=(10,), p=[1./3, 2./3])
array([0, 1, 1, 1, 1, 0, 0, 0, 0, 0])
Answered By: Jaime

Simple one-liner: you can avoid using lists of integers and probability distributions, which are unintuitive and overkill for this problem in my opinion, by simply working with bools first and then casting to int if necessary (though leaving it as a bool array should work in most cases).

>>> import numpy as np
>>> np.random.random(9) < 1/3.
array([False,  True,  True,  True,  True, False, False, False, False])   
>>> (np.random.random(9) < 1/3.).astype(int)
array([0, 0, 0, 0, 0, 1, 0, 0, 1])    
Answered By: Galactic Ketchup

Another way of getting the exact number of ones and zeroes is to sample indices without replacement using np.random.choice:

arr_len = 30
num_ones = 8

arr = np.zeros(arr_len, dtype=int)
idx = np.random.choice(range(arr_len), num_ones, replace=False)
arr[idx] = 1

Out:

arr

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 1, 0, 0])
Answered By: joelostblom