Finding the first and last occurrences in a numpy array

Question:

I have a (large) numpy array of boolean values, where the True indicate the zero-crossing condition of another previously processed signal. In my case, I am only interested in the indexes of the first and last True values in the array and therefore I’d rather not use np.where or np.argwhere in order to avoid going through the entire array unnecessarily.

I’d prefer not to convert my array into a list and use the index() method. Or should I?

So far, the best solution I have come up with is the following:

# Alias for compactness
enum = enumerate

# Example array, desired output = (3, -5))
a = np.array([False, False, False, True, ..., True, False, False, False, False])


out = next(idx for idx in zip((i for i, x in enum(a) if x), (-i-1 for i, x in enum(a[::-1]) if x)))

print(out)
# (3, -5)

But I still find it a bit bulky and wonder whether Numpy already implements such functionality with another syntax. Does it?

Asked By: Martí

||

Answers:

How about this?

import numpy as np

a = np.array([False, False, False, True, False, True, False, False, False, False])

out = (np.argmax(a), a.size - np.argmax(a[::-1]) - 1)

np.argmax() stops at the first True value, so this might solve your problem of not wanting to iterate over the whole array? Inspired by this answer.

Answered By: jan992

Use numpy’s nonzero function, which returns the indices of all non-zero elements in the input array:

a = np.array([False, False, False, True, ..., True, False, False])

# Find the indices of the non-zero elements
indices = np.nonzero(a)

# The first and last indices will be the first and last True values
first = indices[0][0]
last = indices[0][-1]
Answered By: kalzso

Benchmarking results

Well so I have decided to compare the proposed solutions in a typical case scenario. The three approaches are:

@timeit
def gener_sol(a):
    # My OP proposal
    return next(idx for idx in zip((i for i, x in enumerate(a) if x), (a.size-i-1 for i, x in enumerate(a[::-1]) if x)))

@timeit
def nonzero_sol(a):
    # kalzso's proposal
    return np.nonzero(a)[0][[0, -1]]

@timeit
def argmax_sol(a):
    # jan992's proposal
    return np.argmax(a), a.size - np.argmax(a[::-1]) - 1

NOTE: @timeit is a custom wrapper I have to measure functions execution time. Beyond the question whether it is perfect or not, it is the same for all of them so it should be a fair comparison.

And here goes the piece of code to execute them in a loop:

for rep in range(10000):
    N = 2**20  # Array size
    depth = N // 10  # max depth from the edges for first and last True
    
    arr = np.random.randint(0, 2, N).astype(bool)

    # First and last True values are randomly located at both ends
    first = random.randint(0, depth)
    arr[:first] = False
    arr[first]  = True

    last = random.randint(0, depth)
    arr[-last:] = False
    arr[-last]  = True

    arg_res = argmax_sol(arr)
    non_res = nonzero_sol(arr)
    gen_res = gener_sol(arr)

    # Ensure they all return the same
    assert arg_res == tuple(non_res) == gen_res

And the results are the following:

N=2^20 (10000 runs)

argmax_sol....:   10000 runs,       6.38 s  (638.01 μs/run)
nonzero_sol...:   10000 runs,     13.124 s  (1.312 ms/run)
gener_sol.....:   10000 runs,     42.695 s  (4.27 ms/run)

N=2^25 (100 runs)

argmax_sol....:     100 runs,      1.891 s  (18.908 ms/run)
nonzero_sol...:     100 runs,      4.125 s  (41.25 ms/run)
gener_sol.....:     100 runs,     13.799 s  (137.99 ms/run)

So, in this scenario the argmax solution is the clear winner, and the generator the clear loser.
However, I noted that the depth at which the first and last indices are located (up to N/10 in this case) can have a significant impact. A reasonable value in my case is depth=10000 (or even less). Therefore, if I repeat the experiment I obtain:

N=2^20 (10000 runs) depth = 10000

gener_sol.....:     100 runs,    39.967 ms  (399.671 μs/run)
argmax_sol....:     100 runs,    58.851 ms  (588.505 μs/run)
nonzero_sol...:     100 runs,   138.077 ms  (1.381 ms/run)

As you see, in this case the generator yields better results. And the improvement is even larger if depth can be bound to even a smaller number.

See for instance:

N=2^20 (10000 runs) depth=1000

gener_sol.....:   10000 runs,      0.947 s  (94.753 μs/run)
argmax_sol....:   10000 runs,      6.082 s  (608.196 μs/run)
nonzero_sol...:   10000 runs,     14.282 s  (1.428 ms/run)

N=2^25 (1000 runs) depth=1000

gener_sol.....:    1000 runs,      0.053 s  (53.08 μs/run)
argmax_sol....:    1000 runs,      18.84 s  (18.84 ms/run)
nonzero_sol...:    1000 runs,     44.058 s  (44.058 ms/run)

wow, that’s a dramatic improvement! ~6 and ~350 times faster than numpy’s argmax for N=2^20 and 2^25, respectively. So one should not underestimate the power of for loops/generators under appropriate circumstances.

Finally, and to leave a good word for kalszo’s nonzero solution, I do have found that there is a case in which it outperforms the others, and that is when the array is False everywhere except in two locations (the ones to find).

Therefore, changing the line:

# arr = np.random.randint(0, 2, N).astype(bool)
arr = np.zeros(N).astype(bool)

N=2^20 depth=N//10, only two True values

nonzero_sol...:   10000 runs,      1.612 s  (161.186 μs/run)
argmax_sol....:   10000 runs,      6.482 s  (648.178 μs/run)
gener_sol.....:   10000 runs,     44.625 s  (4.463 ms/run)

and with shallower depths:

N=2^20 (10000 runs) depth=10000, only two True values

nonzero_sol...:   10000 runs,      1.571 s  (157.093 μs/run)
gener_sol.....:   10000 runs,      4.208 s  (420.791 μs/run)
argmax_sol....:   10000 runs,      6.235 s  (623.512 μs/run)

N=2^25 (1000 runs) depth=10000, only two True values

gener_sol.....:    1000 runs,      0.433 s  (433.125 μs/run)
nonzero_sol...:    1000 runs,      5.427 s  (5.427 ms/run)
argmax_sol....:    1000 runs,     19.128 s  (19.128 ms/run)

N=2^20 (10000 runs) depth=1000, only two True values

gener_sol.....:   10000 runs,      0.962 s  (96.286 μs/run)
nonzero_sol...:   10000 runs,      1.637 s  (163.687 μs/run)
argmax_sol....:   10000 runs,      6.522 s  (652.228 μs/run)

again, the small depth relative to the array size is a key factor in favor of the generator expression.

In conclusion, the right solution depends very much on the scenario. For my particular case in which:

  • Arrays are large (2^20-2^25)
  • The array has multiple True values at unknown locations
  • The first and last True values are relatively close to the edges

the generator expression has the best performance with up to one or two orders of magnitude.

Answered By: Martí
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.