Select array elements with variable index bounds in numpy

Question:

This might be not possible as the intermediate array would have variable length rows.
What I am trying to accomplish is assigning a value to an array for the elements which have ad index delimited by my array of bounds. As an example:

bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
__assign(array, bounds, 1)

after the assignment should result in

array = [
    [0, 1, 0, 0],
    [0, 1, 1, 0],
    [0, 1, 1, 1]
]

I have tried something like this in various iterations without success:

ind = np.arange(array.shape[0])
array[ind, bounds[ind][0]:bounds[ind][1]] = 1
      

I am trying to avoid loops as this function will be called a lot. Any ideas?

Asked By: Daniele Bernardini

||

Answers:

I’m by no means a Numpy expert, but from the different array indexing options I could find, this was the fastest solution I could figure out:

bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
for i, x in enumerate(bounds):
    cols = slice(x[0], x[1]) 
    array[i, cols] = 1

Here we iterate through the list of bounds and reference the columns using slices.

I tried the below way of first constructing a list of column indices and a list of row indices, but it was way slower. Like 10 seconds plus vir 0.04 seconds on my laptop for a 10 000 x 10 000 array. I guess the slices make a huge difference.

bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
cols = []
rows = []
for i, x in enumerate(bounds):
    cols += list(range(x[0], x[1])) 
    rows += (x[1] - x[0]) * [i]

# print(cols) [1, 1, 2, 1, 2, 3]
# print(rows) [0, 1, 1, 2, 2, 2]

array[rows, cols] = 1
Answered By: D Malan

One of the issues with a purely NumPy method to solve this is that there exists no method to ‘slice’ a NumPy array using bounds from another over an axis. So the resultant expanded bounds end up becoming a variable-length list of lists such as [[1],[1,2],[1,2,3]. Then you can use np.eye and np.sum over axis=0 to get the required output.

bounds = np.array([[1,2], [1,3], [1,4]])

result = np.stack([np.sum(np.eye(4)[slice(*i)], axis=0) for i in bounds])
print(result)
array([[0., 1., 0., 0.],
       [0., 1., 1., 0.],
       [0., 1., 1., 1.]])

I tried various ways of being able to slice the np.eye(4) from [start:stop] over a NumPy array of starts and stops but sadly you will need an iteration to accomplish this.


EDIT: Another way you can do this in a vectorized way without any loops is

def f(b):
    o = np.sum(np.eye(4)[b[0]:b[1]], axis=0)
    return o

np.apply_along_axis(f, 1, bounds)
array([[0., 1., 0., 0.],
       [0., 1., 1., 0.],
       [0., 1., 1., 1.]])

EDIT: If you are looking for a superfast solution but can tolerate a single for loop then the fastest approach based on my simulations among all answers on this thread is

def h(bounds):
    zz = np.zeros((len(bounds), bounds.max()))

    for z,b in zip(zz,bounds):
        z[b[0]:b[1]]=1
        
    return zz

h(bounds)
array([[0., 1., 0., 0.],
       [0., 1., 1., 0.],
       [0., 1., 1., 1.]])
Answered By: Akshay Sehgal

Using numba.njit decorator

import numpy as np
import numba

@numba.njit
def numba_assign_in_range(arr, bounds, val):

  for i in range(len(bounds)):

    s, e = bounds[i]
    arr[i, s:e] = val
  
  return arr

test_size = int(1e6) * 2

bounds = np.zeros((test_size, 2), dtype='int32')
bounds[:, 0] = 1
bounds[:, 1] = np.random.randint(0, 100, test_size)

a = np.zeros((test_size, 100))

with numba.njit

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.2 µs

without numba.njit

CPU times: user 3.54 s, sys: 1.63 ms, total: 3.54 s
Wall time: 3.55 s
Answered By: 4.Pi.n
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.