Vectorization: Each row of the mask contains the column indices to mask for the corresponding row of the array

Question

I have an array and a mask array. They have the same rows. Each row of the mask contains the indices to mask the array for the corresponding row. How to do the vectorization instead of using for loop?

Codes like this:

a = np.zeros((2, 4))
mask = np.array([[2, 3], [0, 1]])

# I'd like a vectorized way to do this (because the rows and cols are large):
a[0, mask[0]] = 1
a[1, mask[1]] = 1

This is what I want to obtain:

array([[0., 0., 1., 1.],
       [1., 1., 0., 0.]])

==================================

The question has been answered by @mozway, but the efficiency between the for-loop solution and vectorized one is questioned by @AhmedAEK. So I did the efficiency comparison:

N = 5000
M = 10000
a = np.zeros((N, M))

# choice without replacement
mask = np.random.rand(N, M).argpartition(3, axis=1)[:,:3]

def t1():
    for i in range(N):
        a[i, mask[i]] = 1
def t2():
    a[np.arange(a.shape[0])[:, None], mask] = 1

Then I use %timeit in Jupyter and got this screenshot:

Asked By: LI Xuhong

||

Source

Answer 1

You can use:

a[[[0],[1]], mask] = 1

Or, programmatically generating the rows slicer:

a[np.arange(a.shape[0])[:,None], mask] = 1

output:

array([[0., 0., 1., 1.],
       [1., 1., 0., 0.]])

Answered By: mozway

Vectorization: Each row of the mask contains the column indices to mask for the corresponding row of the array

Question:

Answers: