What is a vectorized way to perform a sliding window

Question:

I have a nested for loop function. For each index i and j of a 2D matrix, it sums all the elements of a 2D slice of a 2D array, as in sum(data[i-1:i+1,j-1+i+1])).

import numpy as np


data=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])

# This is to specify at the edge indices that the sum wraps around
pad_factor=1
data_padded = np.pad(data, pad_factor, mode='wrap')
print(data_padded)
output:
[[16 13 14 15 16 13]
 [ 4  1  2  3  4  1]
 [ 8  5  6  7  8  5]
 [12  9 10 11 12  9]
 [16 13 14 15 16 13]
 [ 4  1  2  3  4  1]]

result=np.zeros((np.shape(data)))
for i in range(0,np.shape(data)[0]):
    for j in range(0,np.shape(data)[1]):
        result[i,j] =  np.sum(data_padded[i-1+pad_factor:i+1+pad_factor+1, j-1+pad_factor:j+1+pad_factor+1])

print(result)

output:
[[69. 66. 75. 72.]
 [57. 54. 63. 60.]
 [93. 90. 99. 96.]
 [81. 78. 87. 84.]]

However, on a larger array this takes far too long. So I’d like to vectorize it. I’ve tried creating a meshgrid, then inputting these arrays into the formula:

i, j = np.mgrid[0:np.shape(data)[0],0:np.shape(data)[1]]
result=np.sum(data_padded[i-1:i+1+1,j-1:j+1+1])

This produces the error:

TypeError: only integer scalar arrays can be converted to a scalar index

It doesn’t like to take a slice of an array given an array as input.

However, the same method works to take a single element in the matrix, for example:

i, j = np.mgrid[0:np.shape(data)[0]-1,0:np.shape(data)[1]-1]

result=data[i,j]
print(result)

output
[[ 1  2  3]
 [ 5  6  7]
 [ 9 10 11]]

So I’d like to know if there is a way to accomplish this.

I’m also interested in solutions for vectorizing the original problem.

Asked By: notAI

||

Answers:

This is a sliding window task. The stride_tricks sub module has some tools to facilitate this using strides to create a multidimensional view. In this case we make a (4,4,3,3) view, and sum on the last 2 dimensions:

In [72]: np.lib.stride_tricks.sliding_window_view(data_padded,(3,3)).sum(axis=(2,3))
Out[72]: 
array([[69, 66, 75, 72],
       [57, 54, 63, 60],
       [93, 90, 99, 96],
       [81, 78, 87, 84]])

edit

To simplify your example, lets try the 1d indexing

In [93]: x=np.arange(10,100,10);x
Out[93]: array([10, 20, 30, 40, 50, 60, 70, 80, 90])

iteratively we can get a set of 3 element windows with:

In [94]: [x[i:i+3] for i in range(5)]
Out[94]: 
[array([10, 20, 30]),
 array([20, 30, 40]),
 array([30, 40, 50]),
 array([40, 50, 60]),
 array([50, 60, 70])]

But as you found, slicing does not work with arrays as the start/stop values:

In [96]: i = np.arange(0,5); x[i:i+3]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[96], line 1
----> 1 i = np.arange(0,5); x[i:i+3]

TypeError: only integer scalar arrays can be converted to a scalar index

We could though create an array of indices (not slices) with:

In [97]: idx = np.arange(5)[:,None]+np.arange(3)  # np.linspace also works    
In [98]: idx
Out[98]: 
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])    
In [99]: x[idx]
Out[99]: 
array([[10, 20, 30],
       [20, 30, 40],
       [30, 40, 50],
       [40, 50, 60],
       [50, 60, 70]])

In [100]: np.lib.stride_tricks.sliding_window_view(x,3)
Out[100]: 
array([[10, 20, 30],
       [20, 30, 40],
       [30, 40, 50],
       [40, 50, 60],
       [50, 60, 70],
       [60, 70, 80],
       [70, 80, 90]])

In [101]: _.strides
Out[101]: (4, 4)

strides are 4 bytes, or one element, in both directions. Where as, x reshaped to a normal (3,3) array, steps 3 elements down rows:

In [105]: x.reshape(3,3).strides
Out[105]: (12, 4)
Answered By: hpaulj