find index of n consecutive values greater than zero with the largest sum from a numpy array (or pandas Series)

Question

So here is my problem: I have an array like this:
arr = array([0, 0, 1, 8, 10, 20, 26, 32, 37, 52, 0, 0, 46, 42, 30, 19, 8, 2, 0, 0, 0])
In this array I want to find n consecutive values, greater than zero with the biggest sum. In this example with n = 5 this would be array([20, 26, 32, 37, 52]) and the index would be 5.

What I tried is of course a loop:

n = 5
max_sum = 0
max_loc = 0
for i in range(arr.size - n):
    if all(arr[i:i + n] > 0) and arr[i:i + n].sum() > max_sum:
        max_sum = arr[i:i + n].sum()
        max_loc = i
print(max_loc)

This is fine for not too many short arrays but of course I need to use this on many not so short arrays.

I was experimenting with numpy so I would only have to iterate non-zero value groups:

diffs = np.concatenate((np.array([False]), np.diff(arr > 0)))
groups = np.split(arr, np.where(diffs)[0])
for group in groups:
    if group.sum() > 0 and group.size >= n:
        ...

but I believe this is nice but not the right direction. I am looking for a simpler and faster numpy / pandas solution that really uses the powers of these packages.

Asked By: Arvid

||

Source

Answer 1

You can use sliding_window_view:

from numpy.lib.stride_tricks import sliding_window_view

N = 5
win = sliding_window_view(arr, N)
idx = ((win.sum(axis=1)) * ((win>0).all(axis=1))).argmax()
print(idx, arr[idx:idx+N])

# Output
5 [20 26 32 37 52]

Answer greatly enhanced by chrslg to save memory and keep a win as a view.

Update

A nice bonus is this should work with Pandas Series just fine.

N = 5
idx = pd.Series(arr).where(lambda x: x > 0).rolling(N).sum().shift(-N+1).idxmax()
print(idx, arr[idx:idx+N])

# Output
5 [20 26 32 37 52]

Answered By: Corralien

Answer 2

Using cross-correlation, numpy.correlate, is a possible, concise and fast solution:

n=5

idx = np.argmax(np.correlate(arr, np.ones(n), 'valid'))
idx, arr[idx:(idx+5)]

Another possible solution:

n, l = 5, arr.size
idx = np.argmax([np.sum(np.roll(arr,-x)[:n]) for x in range(l-n+1)])
idx, arr[idx:(idx+n)]

Output:

(5, array([20, 26, 32, 37, 52]))

Answered By: PaulS

find index of n consecutive values greater than zero with the largest sum from a numpy array (or pandas Series)

Question:

Answers: