How to get the indices of at least two consecutive values that are all greater than a threshold?

Question

For example, let’s consider the following numpy array:

[1, 5, 0, 5, 4, 6, 1, -1, 5, 10]

Also, let’s suppose that the threshold is equal to 3.
That is to say that we are looking for sequences of at least two consecutive values that are all above the threshold.

The output would be the indices of those values, which in our case is:

[[3, 4, 5], [8, 9]]

If the output array was flattened that would work as well!

[3, 4, 5, 8, 9]

Output Explanation

In our initial array we can see that for index = 1 we have the value 5, which is greater than the threshold, but is not part of a sequence (of at least two values) where every value is greater than the threshold. That’s why this index would not make it to our output.

On the other hand, for indices [3, 4, 5] we have a sequence of (at least two) neighboring values [5, 4, 6] where each and every of them are above the threshold and that’s the reason that their indices are included in the final output!

My Code so far

I have approached the issue with something like this:

(arr > 3).nonzero()

The above command gathers the indices of all the items that are above the threshold. However, I cannot determine if they are consecutive or not. I have thought of trying a diff on the outcome of the above snippet and then may be locating ones (that is to say that indices are one after the other). Which would give us:

np.diff((arr > 3).nonzero())

But I’d still be missing something here.

Asked By: ttsak

||

Source

Answer 1

The below iteration code should help with O(n) complexity

arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]

threshold = 3
sequence = 2

output = []
temp_arr = []        

for i in range(len(arr)):
    if arr[i] > threshold:
        temp_arr.append(i)
    else:
        if len(temp_arr) >= sequence:
            output.append(temp_arr)
        temp_arr = []
if len(temp_arr):
    output.append(temp_arr)
    temp_arr = []
    
print(output)

# Output
# [[3, 4, 5], [8, 9]]

Answered By: Abhi

Answer 2

If you convolve a boolean array with a window full of 1 of size win_size ([1] * win_size), then you will obtain an array where there is the value win_size where the condition held for win_size items:

import numpy as np

def groups(arr, *, threshold, win_size, merge_contiguous=False, flat=False):
    conv = np.convolve((arr >= threshold).astype(int), [1] * win_size, mode="valid")
    indexes_start = np.where(conv == win_size)[0]
    indexes = [np.arange(index, index + win_size) for index in indexes_start]
    
    if flat or merge_contiguous:
        indexes = np.unique(indexes)
        if merge_contiguous:
            indexes = np.split(indexes, np.where(np.diff(indexes) != 1)[0] + 1)
    return indexes


arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
win_size = 2

print(groups(arr, threshold=threshold, win_size=win_size))
print(groups(arr, threshold=threshold, win_size=win_size, merge_contiguous=True))
print(groups(arr, threshold=threshold, win_size=win_size, flat=True))

[array([3, 4]), array([4, 5]), array([8, 9])]
[array([3, 4, 5]), array([8, 9])]
[3 4 5 8 9]

Answered By: paime

Answer 3

You can do what you want using simple numpy operations

import numpy as np

arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])

arr_padded = np.concatenate(([0], arr, [0]))
a = np.where(arr_padded > 3, 1, 0)

da = np.diff(a)

idx_start = (da == 1).nonzero()[0]
idx_stop = (da == -1).nonzero()[0]

valid = (idx_stop - idx_start >= 2).nonzero()[0]

result = [list(range(idx_start[i], idx_stop[i])) for i in valid]
print(result)

Explanation

Array a is a padded binary version of the original array, with 1s where the original elements are greater than three. da contains 1s where "islands" of 1s begin in a, and -1 where the "islands" end in a. Due to the padding, there is guaranteed to be an equal number of 1s and -1s in da. Extracting their indices, we can calculate the length of the islands. Valid index pairs are those whose respective "islands" have length >= 2. Then, its just a matter of generating all numbers between the index bounds of the valid "islands".

Answered By: Alex P

Answer 4

I would suggest using a for loop with two indces. You will have one that starts at j=1 and the other at i=0, both stepping forward by 1.
You can then ask if the value at both is greater than the threshold, if so
add the indices to a list and keep moving forward with j until the threshold or .next() is not greater than threshhold.

values = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
res=[]
threshold= 3

i=0
j=0
for _ in values:
  j=i+1
  lista=[]
  try:
      print(f"i: {i} j:{j}")
      # check if condition is met
      if(values[i] > threshold and values[j] > threshold):
        lista.append(i)
        # add sequence 
        while values[j] > threshold:
          lista.append(j)
          print(f"j while: {j}")
          j+=1
          if(j>=len(values)):
            break
        res.append(lista)
        
      i=j
      if(j>=len(values)): 
        break
  except:
    print("ex")

this works. but needs refactoring

Answered By: SirLoort

Answer 5

I follow your original idea. You are almost done.

I use another diff2 to pick the index of the first value in a sequence. See comments in code for details.

import numpy as np

arr = np.array([ 1,  5,  0,  5,  4,  6,  1, -1,  5, 10])
threshold = 3

all_idx = (arr > threshold).nonzero()[0]
# array([1, 3, 4, 5, 8, 9])

result = np.empty(0)
if all_idx.size > 1:
    diff1 = np.zeros_like(all_idx)
    diff1[1:] = np.diff(all_idx)
    # array([0, 2, 1, 1, 3, 1])
    diff1[0] = diff1[1]
    # array([2, 2, 1, 1, 3, 1])
    # **Positions with a value 1 in diff1 should be reserved.**

    # But we also want the position before each 1. Create another diff2
    diff2 = np.zeros_like(all_idx)
    diff2[:-1] = np.diff(diff1)
    # array([ 2, -1,  0,  2, -2,  0])
    # **Positions with a negative value in diff2 should be reserved.**

    result = all_idx[(diff1==1) | (diff2<0)]

print(result)
# array([3, 4, 5, 8, 9])

Answered By: ILS

Answer 6

Let’s try the following code:

# Simple is better than complex
# Complex is better than complicated

arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]

arr_3=[i if arr[i]>3 else 'a' for i in range(len(arr))]

arr_4=''.join(str(x) for x in arr_3)

i=0

while i<len(arr_5):
    if len(arr_5[i]) <=1:
        del arr_5[i]
    else:
        i+=1
        
arr_6=[list(map(lambda x: int(x), list(x))) for x in arr_5]

print(arr_6)

Outputs:

[[3, 4, 5], [8, 9]]

Answered By: Khaled DELLAL

Answer 7

I’ll try something different using window views, I’m not sure this works all the time so counterexamples are welcome. It has the advantage of not requiring Python loops.

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as window


def consec_thresh(arr, thresh):
    win = window(np.argwhere(arr > thresh), (2, 1))
    return np.unique(win[np.diff(win, axis=2).ravel() == 1, :,:].ravel())

How does it work?

So we start with the array and gather the indices where the threshold is met:

In [180]: np.argwhere(arr > 3)
Out[180]:
array([[1],
       [3],
       [4],
       [5],
       [8],
       [9]])

Then we build a sliding window that makes up pair of values along the column (which is the reason for the (2, 1) shape of the window).

In [181]: window(np.argwhere(arr > 3), (2, 1))
Out[181]:
array([[[[1],
         [3]]],


       [[[3],
         [4]]],


       [[[4],
         [5]]],


       [[[5],
         [8]]],


       [[[8],
         [9]]]])

Now we want to take the difference inside each pair, if it’s one then the indices are consecutive.

In [182]: np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2)
Out[182]:
array([[[[2]]],


       [[[1]]],


       [[[1]]],


       [[[3]]],


       [[[1]]]])

We can plug those values back in the windows we created above,

In [185]: window(np.argwhere(arr > 3), (2, 1))[np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2).ravel() == 1, :, :]
Out[185]:
array([[[[3],
         [4]]],


       [[[4],
         [5]]],


       [[[8],
         [9]]]])

Then we can ravel (flatten without copy when possible), we have to get rid of the repeated indices created by windowing so I call np.unique. We ravel again and get:

 array([3, 4, 5, 8, 9])

Answered By: Nathan Furnal

Answer 8

Here is a solution that makes use of pandas Series:

thresh = 3
win_size = 2

s = pd.Series(arr)

# locating groups of values where there are at least (win_size) consecutive values above the threshold
groups = s.groupby(s.le(thresh).cumsum().loc[s.gt(thresh)]).transform('count').ge(win_size)

0    False
1    False
2    False
3     True
4     True
5     True
6    False
7    False
8     True
9     True
dtype: bool

We can now easily take their indices in a 1D array:

np.flatnonzero(groups)

# array([3, 4, 5, 8, 9], dtype=int64)

OR multiple lists:

[np.arange(index.start, index.stop) for index in np.ma.clump_unmasked(np.ma.masked_not_equal(groups.values, value=True))]

# [array([3, 4, 5], dtype=int64), array([8, 9], dtype=int64)]

Answered By: ttsak

How to get the indices of at least two consecutive values that are all greater than a threshold?

Question:

Output Explanation

My Code so far

Answers:

How does it work?