What is a vectorized way to perform a sliding window
Question:
I have a nested for loop function. For each index i and j of a 2D matrix, it sums all the elements of a 2D slice of a 2D array, as in sum(data[i-1:i+1,j-1+i+1])).
import numpy as np
data=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
# This is to specify at the edge indices that the sum wraps around
pad_factor=1
data_padded = np.pad(data, pad_factor, mode='wrap')
print(data_padded)
output:
[[16 13 14 15 16 13]
[ 4 1 2 3 4 1]
[ 8 5 6 7 8 5]
[12 9 10 11 12 9]
[16 13 14 15 16 13]
[ 4 1 2 3 4 1]]
result=np.zeros((np.shape(data)))
for i in range(0,np.shape(data)[0]):
for j in range(0,np.shape(data)[1]):
result[i,j] = np.sum(data_padded[i-1+pad_factor:i+1+pad_factor+1, j-1+pad_factor:j+1+pad_factor+1])
print(result)
output:
[[69. 66. 75. 72.]
[57. 54. 63. 60.]
[93. 90. 99. 96.]
[81. 78. 87. 84.]]
However, on a larger array this takes far too long. So I’d like to vectorize it. I’ve tried creating a meshgrid, then inputting these arrays into the formula:
i, j = np.mgrid[0:np.shape(data)[0],0:np.shape(data)[1]]
result=np.sum(data_padded[i-1:i+1+1,j-1:j+1+1])
This produces the error:
TypeError: only integer scalar arrays can be converted to a scalar index
It doesn’t like to take a slice of an array given an array as input.
However, the same method works to take a single element in the matrix, for example:
i, j = np.mgrid[0:np.shape(data)[0]-1,0:np.shape(data)[1]-1]
result=data[i,j]
print(result)
output
[[ 1 2 3]
[ 5 6 7]
[ 9 10 11]]
So I’d like to know if there is a way to accomplish this.
I’m also interested in solutions for vectorizing the original problem.
Answers:
This is a sliding window task. The stride_tricks
sub module has some tools to facilitate this using strides
to create a multidimensional view
. In this case we make a (4,4,3,3) view, and sum on the last 2 dimensions:
In [72]: np.lib.stride_tricks.sliding_window_view(data_padded,(3,3)).sum(axis=(2,3))
Out[72]:
array([[69, 66, 75, 72],
[57, 54, 63, 60],
[93, 90, 99, 96],
[81, 78, 87, 84]])
edit
To simplify your example, lets try the 1d indexing
In [93]: x=np.arange(10,100,10);x
Out[93]: array([10, 20, 30, 40, 50, 60, 70, 80, 90])
iteratively we can get a set of 3 element windows with:
In [94]: [x[i:i+3] for i in range(5)]
Out[94]:
[array([10, 20, 30]),
array([20, 30, 40]),
array([30, 40, 50]),
array([40, 50, 60]),
array([50, 60, 70])]
But as you found, slicing does not work with arrays as the start/stop values:
In [96]: i = np.arange(0,5); x[i:i+3]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[96], line 1
----> 1 i = np.arange(0,5); x[i:i+3]
TypeError: only integer scalar arrays can be converted to a scalar index
We could though create an array of indices (not slices) with:
In [97]: idx = np.arange(5)[:,None]+np.arange(3) # np.linspace also works
In [98]: idx
Out[98]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
In [99]: x[idx]
Out[99]:
array([[10, 20, 30],
[20, 30, 40],
[30, 40, 50],
[40, 50, 60],
[50, 60, 70]])
In [100]: np.lib.stride_tricks.sliding_window_view(x,3)
Out[100]:
array([[10, 20, 30],
[20, 30, 40],
[30, 40, 50],
[40, 50, 60],
[50, 60, 70],
[60, 70, 80],
[70, 80, 90]])
In [101]: _.strides
Out[101]: (4, 4)
strides
are 4 bytes, or one element, in both directions. Where as, x
reshaped to a normal (3,3) array, steps 3 elements down rows:
In [105]: x.reshape(3,3).strides
Out[105]: (12, 4)
I have a nested for loop function. For each index i and j of a 2D matrix, it sums all the elements of a 2D slice of a 2D array, as in sum(data[i-1:i+1,j-1+i+1])).
import numpy as np
data=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
# This is to specify at the edge indices that the sum wraps around
pad_factor=1
data_padded = np.pad(data, pad_factor, mode='wrap')
print(data_padded)
output:
[[16 13 14 15 16 13]
[ 4 1 2 3 4 1]
[ 8 5 6 7 8 5]
[12 9 10 11 12 9]
[16 13 14 15 16 13]
[ 4 1 2 3 4 1]]
result=np.zeros((np.shape(data)))
for i in range(0,np.shape(data)[0]):
for j in range(0,np.shape(data)[1]):
result[i,j] = np.sum(data_padded[i-1+pad_factor:i+1+pad_factor+1, j-1+pad_factor:j+1+pad_factor+1])
print(result)
output:
[[69. 66. 75. 72.]
[57. 54. 63. 60.]
[93. 90. 99. 96.]
[81. 78. 87. 84.]]
However, on a larger array this takes far too long. So I’d like to vectorize it. I’ve tried creating a meshgrid, then inputting these arrays into the formula:
i, j = np.mgrid[0:np.shape(data)[0],0:np.shape(data)[1]]
result=np.sum(data_padded[i-1:i+1+1,j-1:j+1+1])
This produces the error:
TypeError: only integer scalar arrays can be converted to a scalar index
It doesn’t like to take a slice of an array given an array as input.
However, the same method works to take a single element in the matrix, for example:
i, j = np.mgrid[0:np.shape(data)[0]-1,0:np.shape(data)[1]-1]
result=data[i,j]
print(result)
output
[[ 1 2 3]
[ 5 6 7]
[ 9 10 11]]
So I’d like to know if there is a way to accomplish this.
I’m also interested in solutions for vectorizing the original problem.
This is a sliding window task. The stride_tricks
sub module has some tools to facilitate this using strides
to create a multidimensional view
. In this case we make a (4,4,3,3) view, and sum on the last 2 dimensions:
In [72]: np.lib.stride_tricks.sliding_window_view(data_padded,(3,3)).sum(axis=(2,3))
Out[72]:
array([[69, 66, 75, 72],
[57, 54, 63, 60],
[93, 90, 99, 96],
[81, 78, 87, 84]])
edit
To simplify your example, lets try the 1d indexing
In [93]: x=np.arange(10,100,10);x
Out[93]: array([10, 20, 30, 40, 50, 60, 70, 80, 90])
iteratively we can get a set of 3 element windows with:
In [94]: [x[i:i+3] for i in range(5)]
Out[94]:
[array([10, 20, 30]),
array([20, 30, 40]),
array([30, 40, 50]),
array([40, 50, 60]),
array([50, 60, 70])]
But as you found, slicing does not work with arrays as the start/stop values:
In [96]: i = np.arange(0,5); x[i:i+3]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[96], line 1
----> 1 i = np.arange(0,5); x[i:i+3]
TypeError: only integer scalar arrays can be converted to a scalar index
We could though create an array of indices (not slices) with:
In [97]: idx = np.arange(5)[:,None]+np.arange(3) # np.linspace also works
In [98]: idx
Out[98]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
In [99]: x[idx]
Out[99]:
array([[10, 20, 30],
[20, 30, 40],
[30, 40, 50],
[40, 50, 60],
[50, 60, 70]])
In [100]: np.lib.stride_tricks.sliding_window_view(x,3)
Out[100]:
array([[10, 20, 30],
[20, 30, 40],
[30, 40, 50],
[40, 50, 60],
[50, 60, 70],
[60, 70, 80],
[70, 80, 90]])
In [101]: _.strides
Out[101]: (4, 4)
strides
are 4 bytes, or one element, in both directions. Where as, x
reshaped to a normal (3,3) array, steps 3 elements down rows:
In [105]: x.reshape(3,3).strides
Out[105]: (12, 4)