Select array elements with variable index bounds in numpy
Question:
This might be not possible as the intermediate array would have variable length rows.
What I am trying to accomplish is assigning a value to an array for the elements which have ad index delimited by my array of bounds. As an example:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
__assign(array, bounds, 1)
after the assignment should result in
array = [
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 1]
]
I have tried something like this in various iterations without success:
ind = np.arange(array.shape[0])
array[ind, bounds[ind][0]:bounds[ind][1]] = 1
I am trying to avoid loops as this function will be called a lot. Any ideas?
Answers:
I’m by no means a Numpy expert, but from the different array indexing options I could find, this was the fastest solution I could figure out:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
for i, x in enumerate(bounds):
cols = slice(x[0], x[1])
array[i, cols] = 1
Here we iterate through the list of bounds and reference the columns using slices.
I tried the below way of first constructing a list of column indices and a list of row indices, but it was way slower. Like 10 seconds plus vir 0.04 seconds on my laptop for a 10 000 x 10 000 array. I guess the slices make a huge difference.
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
cols = []
rows = []
for i, x in enumerate(bounds):
cols += list(range(x[0], x[1]))
rows += (x[1] - x[0]) * [i]
# print(cols) [1, 1, 2, 1, 2, 3]
# print(rows) [0, 1, 1, 2, 2, 2]
array[rows, cols] = 1
One of the issues with a purely NumPy method to solve this is that there exists no method to ‘slice’ a NumPy array using bounds from another over an axis. So the resultant expanded bounds end up becoming a variable-length list of lists such as [[1],[1,2],[1,2,3]
. Then you can use np.eye
and np.sum
over axis=0 to get the required output.
bounds = np.array([[1,2], [1,3], [1,4]])
result = np.stack([np.sum(np.eye(4)[slice(*i)], axis=0) for i in bounds])
print(result)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
I tried various ways of being able to slice the np.eye(4)
from [start:stop] over a NumPy array of starts and stops but sadly you will need an iteration to accomplish this.
EDIT: Another way you can do this in a vectorized way without any loops is –
def f(b):
o = np.sum(np.eye(4)[b[0]:b[1]], axis=0)
return o
np.apply_along_axis(f, 1, bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
EDIT: If you are looking for a superfast solution but can tolerate a single for loop then the fastest approach based on my simulations among all answers on this thread is –
def h(bounds):
zz = np.zeros((len(bounds), bounds.max()))
for z,b in zip(zz,bounds):
z[b[0]:b[1]]=1
return zz
h(bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
Using numba.njit
decorator
import numpy as np
import numba
@numba.njit
def numba_assign_in_range(arr, bounds, val):
for i in range(len(bounds)):
s, e = bounds[i]
arr[i, s:e] = val
return arr
test_size = int(1e6) * 2
bounds = np.zeros((test_size, 2), dtype='int32')
bounds[:, 0] = 1
bounds[:, 1] = np.random.randint(0, 100, test_size)
a = np.zeros((test_size, 100))
with numba.njit
CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.2 µs
without numba.njit
CPU times: user 3.54 s, sys: 1.63 ms, total: 3.54 s
Wall time: 3.55 s
This might be not possible as the intermediate array would have variable length rows.
What I am trying to accomplish is assigning a value to an array for the elements which have ad index delimited by my array of bounds. As an example:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
__assign(array, bounds, 1)
after the assignment should result in
array = [
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 1]
]
I have tried something like this in various iterations without success:
ind = np.arange(array.shape[0])
array[ind, bounds[ind][0]:bounds[ind][1]] = 1
I am trying to avoid loops as this function will be called a lot. Any ideas?
I’m by no means a Numpy expert, but from the different array indexing options I could find, this was the fastest solution I could figure out:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
for i, x in enumerate(bounds):
cols = slice(x[0], x[1])
array[i, cols] = 1
Here we iterate through the list of bounds and reference the columns using slices.
I tried the below way of first constructing a list of column indices and a list of row indices, but it was way slower. Like 10 seconds plus vir 0.04 seconds on my laptop for a 10 000 x 10 000 array. I guess the slices make a huge difference.
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
cols = []
rows = []
for i, x in enumerate(bounds):
cols += list(range(x[0], x[1]))
rows += (x[1] - x[0]) * [i]
# print(cols) [1, 1, 2, 1, 2, 3]
# print(rows) [0, 1, 1, 2, 2, 2]
array[rows, cols] = 1
One of the issues with a purely NumPy method to solve this is that there exists no method to ‘slice’ a NumPy array using bounds from another over an axis. So the resultant expanded bounds end up becoming a variable-length list of lists such as [[1],[1,2],[1,2,3]
. Then you can use np.eye
and np.sum
over axis=0 to get the required output.
bounds = np.array([[1,2], [1,3], [1,4]])
result = np.stack([np.sum(np.eye(4)[slice(*i)], axis=0) for i in bounds])
print(result)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
I tried various ways of being able to slice the np.eye(4)
from [start:stop] over a NumPy array of starts and stops but sadly you will need an iteration to accomplish this.
EDIT: Another way you can do this in a vectorized way without any loops is –
def f(b):
o = np.sum(np.eye(4)[b[0]:b[1]], axis=0)
return o
np.apply_along_axis(f, 1, bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
EDIT: If you are looking for a superfast solution but can tolerate a single for loop then the fastest approach based on my simulations among all answers on this thread is –
def h(bounds):
zz = np.zeros((len(bounds), bounds.max()))
for z,b in zip(zz,bounds):
z[b[0]:b[1]]=1
return zz
h(bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
Using numba.njit
decorator
import numpy as np
import numba
@numba.njit
def numba_assign_in_range(arr, bounds, val):
for i in range(len(bounds)):
s, e = bounds[i]
arr[i, s:e] = val
return arr
test_size = int(1e6) * 2
bounds = np.zeros((test_size, 2), dtype='int32')
bounds[:, 0] = 1
bounds[:, 1] = np.random.randint(0, 100, test_size)
a = np.zeros((test_size, 100))
with numba.njit
CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.2 µs
without numba.njit
CPU times: user 3.54 s, sys: 1.63 ms, total: 3.54 s
Wall time: 3.55 s