Split a numpy array into nonoverlapping arrays
Question:
I am trying to split a 2D numpy array that is not square into nonoverlapping chunks of smaller 2D numpy arrays. Example – split a 3×4 array into chunks of 2×2:
The array to be split:
[[ 34 15 16 17]
[ 78 98 99 100]
[ 23 78 79 80]]
This should output:
[[34 15]
[78 98]]
[[ 16 17]
[ 99 100]]
So [23 78 79 80]
are dropped because they do not match the 2×2 requirement.
My current code is this:
new_array = np.array([[34,15,16,17], [78,98,99,100], [23,78,79,80]])
window = 2
for x in range(0, new_array.shape[0], window):
for y in range(0, new_array.shape[1], window):
patch_im1 = new_array[x:x+window,y:y+window]
This outputs:
[[34 15]
[78 98]]
[[ 16 17]
[ 99 100]]
[[23 78]]
[[79 80]]
Ideally, I would like to have the chunks stored in a list.
Answers:
Not sure of all possible cases (input array sizes) you may have in your problem, but this approach should be flexible to work with any 2D input size and any chunk shape. It utilizes view_as_blocks
from the skimage
library to get non-overlapping views of an array.
import numpy as np
from skimage.util.shape import view_as_blocks
new_array = np.array([[34,15,16,17], [78,98,99,100], [23,78,79,80]])
First, you need to trim the original array to get a size that is evenly divisible by the shape of your desired chunks. So, this 3x4
array will become a 2x4
array when we remove the last row.
chunk_shape = (2,2)
chunk_rows, chunk_cols = chunk_shape
rows_to_keep = new_array.shape[0] - new_array.shape[0] % chunk_rows
cols_to_keep = new_array.shape[1] - new_array.shape[1] % chunk_cols
temp = new_array[:rows_to_keep, :cols_to_keep]
print(temp)
# [[34 15 16 17]
# [78 98 99 100]]
Now, we can use the view_as_blocks
function to obtain the chunks of desired size and convert the result to a list of lists as you want:
res = view_as_blocks(temp, chunk_shape).reshape(-1, np.prod(chunk_shape)).tolist()
print(res)
# [[34, 15, 78, 98], [16, 17, 99, 100]]
This should work on any number of dimensions. Let’s take an array:
new_array = np.array(range(25)).reshape((5,5))
Output:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
First calculate the number of last rows/columns which you don’t need:
N = 2
rem = np.array(new_array.shape) % N
Output:
array([1, 1])
Then remove the number of rows/columns from the end of your array on each dimension:
for ax, v in enumerate(rem):
if v != 0:
new_array = np.delete(new_array, range(-1, -v-1, -1), axis=ax)
Output:
array([[ 0, 1, 2, 3],
[ 5, 6, 7, 8],
[10, 11, 12, 13],
[15, 16, 17, 18]])
Then use np.split
on each dimension:
arr_list = [new_array]
for ax in range(len(new_array.shape)):
arr_list_list = [np.split(arr, arr.shape[ax] / N, axis=ax) for arr in arr_list]
arr_list = [arr for j in arr_list_list for arr in j]
Output:
[array([[0, 1],
[5, 6]]),
array([[2, 3],
[7, 8]]),
array([[10, 11],
[15, 16]]),
array([[12, 13],
[17, 18]])]
Then transform into a list:
[list(i.reshape(i.size)) for i in arr_list]
Output:
[[0, 1, 5, 6], [2, 3, 7, 8], [10, 11, 15, 16], [12, 13, 17, 18]]
You can add another dimension to your array. There are a bunch of ways of doing this. A relatively simple one is to get a view of the data that is the nearest multiple of the size that you want, then reshape and possibly transpose. The issue with doing it this way is of course that the moment you try to ravel the leading dimensions together, you will copy the data, since otherwise the strides could not work out because of the subset:
data = np.array([[34, 15, 16, 17],
[78, 98, 99, 100],
[23, 78, 79, 80]])
window = (2, 2)
trim = np.array(data.shape)
view_shape = trim - (-trim) % window
view = data[tuple(slice(None, v) for v in view_shape)]
new_shape = np.stack((view_shape // window, window), -1).ravel()
axes = np.arange(len(new_shape))
result = view.reshape(new_shape).transpose(*axes[::2], *axes[1::2])
The shape of result
starts with the number of times window
fits into data
, which is (1, 2)
. The remaining dimensions are the window. This should work for any number of dimensions, as long as the length of window
match the dimensions data
.
If you need a 1D outer container instead of ND, you have two options. If you are OK copying data and want a monolithic array, you can do
result.reshape(-1, *result.shape[:-data.ndim])
If you want views into the original data for each window, you’ll have to use a list:
[result[i, j] for i in range(result.shape[0]) for j in range(result.shape[1])]
I am trying to split a 2D numpy array that is not square into nonoverlapping chunks of smaller 2D numpy arrays. Example – split a 3×4 array into chunks of 2×2:
The array to be split:
[[ 34 15 16 17]
[ 78 98 99 100]
[ 23 78 79 80]]
This should output:
[[34 15]
[78 98]]
[[ 16 17]
[ 99 100]]
So [23 78 79 80]
are dropped because they do not match the 2×2 requirement.
My current code is this:
new_array = np.array([[34,15,16,17], [78,98,99,100], [23,78,79,80]])
window = 2
for x in range(0, new_array.shape[0], window):
for y in range(0, new_array.shape[1], window):
patch_im1 = new_array[x:x+window,y:y+window]
This outputs:
[[34 15]
[78 98]]
[[ 16 17]
[ 99 100]]
[[23 78]]
[[79 80]]
Ideally, I would like to have the chunks stored in a list.
Not sure of all possible cases (input array sizes) you may have in your problem, but this approach should be flexible to work with any 2D input size and any chunk shape. It utilizes view_as_blocks
from the skimage
library to get non-overlapping views of an array.
import numpy as np
from skimage.util.shape import view_as_blocks
new_array = np.array([[34,15,16,17], [78,98,99,100], [23,78,79,80]])
First, you need to trim the original array to get a size that is evenly divisible by the shape of your desired chunks. So, this 3x4
array will become a 2x4
array when we remove the last row.
chunk_shape = (2,2)
chunk_rows, chunk_cols = chunk_shape
rows_to_keep = new_array.shape[0] - new_array.shape[0] % chunk_rows
cols_to_keep = new_array.shape[1] - new_array.shape[1] % chunk_cols
temp = new_array[:rows_to_keep, :cols_to_keep]
print(temp)
# [[34 15 16 17]
# [78 98 99 100]]
Now, we can use the view_as_blocks
function to obtain the chunks of desired size and convert the result to a list of lists as you want:
res = view_as_blocks(temp, chunk_shape).reshape(-1, np.prod(chunk_shape)).tolist()
print(res)
# [[34, 15, 78, 98], [16, 17, 99, 100]]
This should work on any number of dimensions. Let’s take an array:
new_array = np.array(range(25)).reshape((5,5))
Output:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
First calculate the number of last rows/columns which you don’t need:
N = 2
rem = np.array(new_array.shape) % N
Output:
array([1, 1])
Then remove the number of rows/columns from the end of your array on each dimension:
for ax, v in enumerate(rem):
if v != 0:
new_array = np.delete(new_array, range(-1, -v-1, -1), axis=ax)
Output:
array([[ 0, 1, 2, 3],
[ 5, 6, 7, 8],
[10, 11, 12, 13],
[15, 16, 17, 18]])
Then use np.split
on each dimension:
arr_list = [new_array]
for ax in range(len(new_array.shape)):
arr_list_list = [np.split(arr, arr.shape[ax] / N, axis=ax) for arr in arr_list]
arr_list = [arr for j in arr_list_list for arr in j]
Output:
[array([[0, 1],
[5, 6]]),
array([[2, 3],
[7, 8]]),
array([[10, 11],
[15, 16]]),
array([[12, 13],
[17, 18]])]
Then transform into a list:
[list(i.reshape(i.size)) for i in arr_list]
Output:
[[0, 1, 5, 6], [2, 3, 7, 8], [10, 11, 15, 16], [12, 13, 17, 18]]
You can add another dimension to your array. There are a bunch of ways of doing this. A relatively simple one is to get a view of the data that is the nearest multiple of the size that you want, then reshape and possibly transpose. The issue with doing it this way is of course that the moment you try to ravel the leading dimensions together, you will copy the data, since otherwise the strides could not work out because of the subset:
data = np.array([[34, 15, 16, 17],
[78, 98, 99, 100],
[23, 78, 79, 80]])
window = (2, 2)
trim = np.array(data.shape)
view_shape = trim - (-trim) % window
view = data[tuple(slice(None, v) for v in view_shape)]
new_shape = np.stack((view_shape // window, window), -1).ravel()
axes = np.arange(len(new_shape))
result = view.reshape(new_shape).transpose(*axes[::2], *axes[1::2])
The shape of result
starts with the number of times window
fits into data
, which is (1, 2)
. The remaining dimensions are the window. This should work for any number of dimensions, as long as the length of window
match the dimensions data
.
If you need a 1D outer container instead of ND, you have two options. If you are OK copying data and want a monolithic array, you can do
result.reshape(-1, *result.shape[:-data.ndim])
If you want views into the original data for each window, you’ll have to use a list:
[result[i, j] for i in range(result.shape[0]) for j in range(result.shape[1])]