Accessing NumPy array elements not in a given index list
Question:
I have a NumPy array with the shape (100, 170, 256). And I have an array consisting of indexes [0, 10, 20, 40, 70].
I can get the sub-arrays corresponding to the indexes as follows:
sub_array = array[..., index]
This returns an array with the shape (100, 170, 5) as expected. Now, I am trying to take the complement and get the sub-array NOT corresponding to those indexes. So, I did:
sub_array = array[..., ~index]
This still returns me an array of shape (100, 170, 5) for some reason. I wonder how to do this complement operation of these indexes in python?
[EDIT]
Also tried:
sub_array = array[..., not(index.any)]
However, this does not do the thing I want as well (getting array of shape (100, 170, 251).
Answers:
I tend to work with boolean arrays rather than indices where possible to avoid this issue. You could use in1d
to get one, even though it isn’t very pretty:
>>> arr[..., index].shape
(100, 170, 5)
>>> arr[..., np.in1d(np.arange(arr.shape[-1]),index)].shape
(100, 170, 5)
>>> arr[..., ~np.in1d(np.arange(arr.shape[-1]),index)].shape
(100, 170, 251)
have a look at what ~index gives you – I think it is:
array([ -1, -11, -21, -41, -71])
So, your call
sub_array = array[..., ~index]
will return 5 entries, corresponding to indices [ -1, -11, -21, -41, -71] i.e. 255, 245, 235, 215 and 185 in your case
Similarly, not(index.any) gives
False
hence why your second try doesn’t work
This should work:
sub_array = array[..., [i for i in xrange(256) if i not in index]]
I’m assuming index
is a numpy array – if so, the explanation for what the tilde operator is doing can be found here:
What does the unary operator ~ do in numpy?
As for what you’re trying to accomplish, you could assemble a complementary index array:
notIndex = numpy.array([i for i in xrange(256) if i not in index])
And then use notIndex
instead of index
.
The way you have your data, the simplest approach is to use np.delete
:
sub_array = np.delete(array, index, axis=2)
Alternatively, the logical operators you were trying to use can be applied with boolean arrays as @DSM suggests:
mask = np.ones(a.shape[2], dtype=bool)
mask[index] = False
sub_array = array[:,:, mask]
(I wouldn’t call your array array
but I followed the names in your question)
The question is answered but I propose a benchmark of the three methods here.
Fastest solution is boolean mask (with small and larger index array size)
mask = np.ones(arr.size, dtype=bool)
mask[indexes] = False
result = arr[mask]
It is 2000 times faster than the list comprehension and marginaly faster than np.delete
Code to reproduce
Three proposed solutions: list comprehension (sol1
), boolean mask (sol2
) or np.delete
(sol3
)
d = 100000
a = np.random.rand(d)
idx = np.random.randint(d, size = 10)
# list comprehension
def sol1(arr, indexes):
return arr[[i for i in range(arr.size) if i not in indexes]]
sol1(a, idx)
# Out[30]: array([0.13044518, 0.68564961, 0.03033223, ..., 0.03796257, 0.40137137, 0.45403929])
# boolean mask
def sol2(arr, indexes):
mask = np.ones(arr.size, dtype=bool)
mask[indexes] = False
return arr[mask]
sol2(a, idx)
# Out[32]: array([0.13044518, 0.68564961, 0.03033223, ..., 0.03796257, 0.40137137, 0.45403929])
# np.delete
def sol3(arr, indexes):
return np.delete(arr, indexes)
sol3(a, idx)
# Out[36]: array([0.13044518, 0.68564961, 0.03033223, ..., 0.03796257, 0.40137137, 0.45403929])
Results
%timeit sol1(a, idx)
384 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit sol2(a, idx)
154 µs ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit sol3(a, idx)
194 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
idx = np.random.randint(d, size = 1000)
%timeit sol1(a, idx)
386 ms ± 7.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit sol2(a, idx)
171 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit sol3(a, idx)
205 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I have a NumPy array with the shape (100, 170, 256). And I have an array consisting of indexes [0, 10, 20, 40, 70].
I can get the sub-arrays corresponding to the indexes as follows:
sub_array = array[..., index]
This returns an array with the shape (100, 170, 5) as expected. Now, I am trying to take the complement and get the sub-array NOT corresponding to those indexes. So, I did:
sub_array = array[..., ~index]
This still returns me an array of shape (100, 170, 5) for some reason. I wonder how to do this complement operation of these indexes in python?
[EDIT]
Also tried:
sub_array = array[..., not(index.any)]
However, this does not do the thing I want as well (getting array of shape (100, 170, 251).
I tend to work with boolean arrays rather than indices where possible to avoid this issue. You could use in1d
to get one, even though it isn’t very pretty:
>>> arr[..., index].shape
(100, 170, 5)
>>> arr[..., np.in1d(np.arange(arr.shape[-1]),index)].shape
(100, 170, 5)
>>> arr[..., ~np.in1d(np.arange(arr.shape[-1]),index)].shape
(100, 170, 251)
have a look at what ~index gives you – I think it is:
array([ -1, -11, -21, -41, -71])
So, your call
sub_array = array[..., ~index]
will return 5 entries, corresponding to indices [ -1, -11, -21, -41, -71] i.e. 255, 245, 235, 215 and 185 in your case
Similarly, not(index.any) gives
False
hence why your second try doesn’t work
This should work:
sub_array = array[..., [i for i in xrange(256) if i not in index]]
I’m assuming index
is a numpy array – if so, the explanation for what the tilde operator is doing can be found here:
What does the unary operator ~ do in numpy?
As for what you’re trying to accomplish, you could assemble a complementary index array:
notIndex = numpy.array([i for i in xrange(256) if i not in index])
And then use notIndex
instead of index
.
The way you have your data, the simplest approach is to use np.delete
:
sub_array = np.delete(array, index, axis=2)
Alternatively, the logical operators you were trying to use can be applied with boolean arrays as @DSM suggests:
mask = np.ones(a.shape[2], dtype=bool)
mask[index] = False
sub_array = array[:,:, mask]
(I wouldn’t call your array array
but I followed the names in your question)
The question is answered but I propose a benchmark of the three methods here.
Fastest solution is boolean mask (with small and larger index array size)
mask = np.ones(arr.size, dtype=bool)
mask[indexes] = False
result = arr[mask]
It is 2000 times faster than the list comprehension and marginaly faster than np.delete
Code to reproduce
Three proposed solutions: list comprehension (sol1
), boolean mask (sol2
) or np.delete
(sol3
)
d = 100000
a = np.random.rand(d)
idx = np.random.randint(d, size = 10)
# list comprehension
def sol1(arr, indexes):
return arr[[i for i in range(arr.size) if i not in indexes]]
sol1(a, idx)
# Out[30]: array([0.13044518, 0.68564961, 0.03033223, ..., 0.03796257, 0.40137137, 0.45403929])
# boolean mask
def sol2(arr, indexes):
mask = np.ones(arr.size, dtype=bool)
mask[indexes] = False
return arr[mask]
sol2(a, idx)
# Out[32]: array([0.13044518, 0.68564961, 0.03033223, ..., 0.03796257, 0.40137137, 0.45403929])
# np.delete
def sol3(arr, indexes):
return np.delete(arr, indexes)
sol3(a, idx)
# Out[36]: array([0.13044518, 0.68564961, 0.03033223, ..., 0.03796257, 0.40137137, 0.45403929])
Results
%timeit sol1(a, idx)
384 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit sol2(a, idx)
154 µs ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit sol3(a, idx)
194 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
idx = np.random.randint(d, size = 1000)
%timeit sol1(a, idx)
386 ms ± 7.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit sol2(a, idx)
171 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit sol3(a, idx)
205 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)