# Numpy first occurrence of value greater than existing value

## Question:

I have a 1D array in numpy and I want to find the position of the index where a value exceeds the value in numpy array.

E.g.

``````aa = range(-10,10)
``````

Find position in `aa` where, the value `5` gets exceeded.

This is a little faster (and looks nicer)

``````np.argmax(aa>5)
``````

Since `argmax` will stop at the first `True` (“In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.”) and doesn’t save another list.

``````In : N = 10000

In : aa = np.arange(-N,N)

In : timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop

In : timeit np.where(aa>N/2)
10000 loops, best of 3: 141 us per loop

In : timeit np.nonzero(aa>N/2)
10000 loops, best of 3: 142 us per loop
``````
``````In : a=np.arange(-10,10)

In : a
Out:
array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
3,   4,   5,   6,   7,   8,   9])

In : np.where(a>5)
Out: (array([16, 17, 18, 19]),)

In : np.where(a>5)
Out: 16
``````

given the sorted content of your array, there is an even faster method: searchsorted.

``````import time
N = 10000
aa = np.arange(-N,N)
%timeit np.searchsorted(aa, N/2)+1
%timeit np.argmax(aa>N/2)
%timeit np.where(aa>N/2)
%timeit np.nonzero(aa>N/2)

# Output
100000 loops, best of 3: 5.97 µs per loop
10000 loops, best of 3: 46.3 µs per loop
10000 loops, best of 3: 154 µs per loop
10000 loops, best of 3: 154 µs per loop
``````

I was also interested in this and I’ve compared all the suggested answers with perfplot. (Disclaimer: I’m the author of perfplot.)

If you know that the array you’re looking through is already sorted, then

``````numpy.searchsorted(a, alpha)
``````

is for you. It’s O(log(n)) operation, i.e., the speed hardly depends on the size of the array. You can’t get faster than that.

If you don’t know anything about your array, you’re not going wrong with

``````numpy.argmax(a > alpha)
`````` Unsorted: Code to reproduce the plot:

``````import numpy
import perfplot

alpha = 0.5
numpy.random.seed(0)

def argmax(data):
return numpy.argmax(data > alpha)

def where(data):
return numpy.where(data > alpha)

def nonzero(data):
return numpy.nonzero(data > alpha)

def searchsorted(data):
return numpy.searchsorted(data, alpha)

perfplot.save(
"out.png",
# setup=numpy.random.rand,
setup=lambda n: numpy.sort(numpy.random.rand(n)),
kernels=[argmax, where, nonzero, searchsorted],
n_range=[2 ** k for k in range(2, 23)],
xlabel="len(array)",
)
``````

I would go with

``````i = np.min(np.where(V >= x))
``````

where `V` is vector (1d array), `x` is the value and `i` is the resulting index.

# Arrays that have a constant step between elements

In case of a `range` or any other linearly increasing array you can simply calculate the index programmatically, no need to actually iterate over the array at all:

``````def first_index_calculate_range_like(val, arr):
if len(arr) == 0:
raise ValueError('no value greater than {}'.format(val))
elif len(arr) == 1:
if arr > val:
return 0
else:
raise ValueError('no value greater than {}'.format(val))

first_value = arr
step = arr - first_value
# For linearly decreasing arrays or constant arrays we only need to check
# the first element, because if that does not satisfy the condition
# no other element will.
if step <= 0:
if first_value > val:
return 0
else:
raise ValueError('no value greater than {}'.format(val))

calculated_position = (val - first_value) / step

if calculated_position < 0:
return 0
elif calculated_position > len(arr) - 1:
raise ValueError('no value greater than {}'.format(val))

return int(calculated_position) + 1
``````

One could probably improve that a bit. I have made sure it works correctly for a few sample arrays and values but that doesn’t mean there couldn’t be mistakes in there, especially considering that it uses floats…

``````>>> import numpy as np
>>> first_index_calculate_range_like(5, np.arange(-10, 10))
16
>>> np.arange(-10, 10)  # double check
6

>>> first_index_calculate_range_like(4.8, np.arange(-10, 10))
15
``````

Given that it can calculate the position without any iteration it will be constant time (`O(1)`) and can probably beat all other mentioned approaches. However it requires a constant step in the array, otherwise it will produce wrong results.

# General solution using numba

A more general approach would be using a numba function:

``````@nb.njit
def first_index_numba(val, arr):
for idx in range(len(arr)):
if arr[idx] > val:
return idx
return -1
``````

That will work for any array but it has to iterate over the array, so in the average case it will be `O(n)`:

``````>>> first_index_numba(4.8, np.arange(-10, 10))
15
>>> first_index_numba(5, np.arange(-10, 10))
16
``````

# Benchmark

Even though Nico Schlömer already provided some benchmarks I thought it might be useful to include my new solutions and to test for different “values”.

The test setup:

``````import numpy as np
import math
import numba as nb

def first_index_using_argmax(val, arr):
return np.argmax(arr > val)

def first_index_using_where(val, arr):
return np.where(arr > val)

def first_index_using_nonzero(val, arr):
return np.nonzero(arr > val)

def first_index_using_searchsorted(val, arr):
return np.searchsorted(arr, val) + 1

def first_index_using_min(val, arr):
return np.min(np.where(arr > val))

def first_index_calculate_range_like(val, arr):
if len(arr) == 0:
raise ValueError('empty array')
elif len(arr) == 1:
if arr > val:
return 0
else:
raise ValueError('no value greater than {}'.format(val))

first_value = arr
step = arr - first_value
if step <= 0:
if first_value > val:
return 0
else:
raise ValueError('no value greater than {}'.format(val))

calculated_position = (val - first_value) / step

if calculated_position < 0:
return 0
elif calculated_position > len(arr) - 1:
raise ValueError('no value greater than {}'.format(val))

return int(calculated_position) + 1

@nb.njit
def first_index_numba(val, arr):
for idx in range(len(arr)):
if arr[idx] > val:
return idx
return -1

funcs = [
first_index_using_argmax,
first_index_using_min,
first_index_using_nonzero,
first_index_calculate_range_like,
first_index_numba,
first_index_using_searchsorted,
first_index_using_where
]

from simple_benchmark import benchmark, MultiArgument
``````

and the plots were generated using:

``````%matplotlib notebook
b.plot()
``````

## item is at the beginning

``````b = benchmark(
funcs,
{2**i: MultiArgument([0, np.arange(2**i)]) for i in range(2, 20)},
argument_name="array size")
`````` The numba function performs best followed by the calculate-function and the searchsorted function. The other solutions perform much worse.

## item is at the end

``````b = benchmark(
funcs,
{2**i: MultiArgument([2**i-2, np.arange(2**i)]) for i in range(2, 20)},
argument_name="array size")
`````` For small arrays the numba function performs amazingly fast, however for bigger arrays it’s outperformed by the calculate-function and the searchsorted function.

### item is at sqrt(len)

``````b = benchmark(
funcs,
{2**i: MultiArgument([np.sqrt(2**i), np.arange(2**i)]) for i in range(2, 20)},
argument_name="array size")
`````` This is more interesting. Again numba and the calculate function perform great, however this is actually triggering the worst case of searchsorted which really doesn’t work well in this case.

## Comparison of the functions when no value satisfies the condition

Another interesting point is how these function behave if there is no value whose index should be returned:

``````arr = np.ones(100)
value = 2

for func in funcs:
print(func.__name__)
try:
print('-->', func(value, arr))
except Exception as e:
print('-->', e)
``````

With this result:

``````first_index_using_argmax
--> 0
first_index_using_min
--> zero-size array to reduction operation minimum which has no identity
first_index_using_nonzero
--> index 0 is out of bounds for axis 0 with size 0
first_index_calculate_range_like
--> no value greater than 2
first_index_numba
--> -1
first_index_using_searchsorted
--> 101
first_index_using_where
--> index 0 is out of bounds for axis 0 with size 0
``````

Searchsorted, argmax, and numba simply return a wrong value. However `searchsorted` and `numba` return an index that is not a valid index for the array.

The functions `where`, `min`, `nonzero` and `calculate` throw an exception. However only the exception for `calculate` actually says anything helpful.

That means one actually has to wrap these calls in an appropriate wrapper function that catches exceptions or invalid return values and handle appropriately, at least if you aren’t sure if the value could be in the array.

Note: The calculate and `searchsorted` options only work in special conditions. The “calculate” function requires a constant step and the searchsorted requires the array to be sorted. So these could be useful in the right circumstances but aren’t general solutions for this problem. In case you’re dealing with sorted Python lists you might want to take a look at the bisect module instead of using Numpys searchsorted.

I’d like to propose

``````np.min(np.append(np.where(aa>5),np.inf))
``````

This will return the smallest index where the condition is met, while returning infinity if the condition is never met (and `where` returns an empty array).

You should use `np.where` instead of `np.argmax`. The latter will return position 0 even if no value is found, which is not the indexes you expect.

``````>>> aa = np.array(range(-10,10))
>>> print(aa)
array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
3,   4,   5,   6,   7,   8,   9])
``````

If the condition is met, it returns an array of the indexes.

``````>>> idx = np.where(aa > 5)
>>> print(idx)
array([16, 17, 18, 19], dtype=int64)
``````

Otherwise, if not met, it returns an empty array.

``````>>> not_found = len(np.where(aa > 20))
>>> print(not_found)
array([], dtype=int64)
``````

The point against `argmax` for this case is: the simpler the best, IF the solution is not ambiguous. So, to check if something fell into the condition, just do a `if len(np.where(aa > value_to_search)) > 0`.

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.