Shift each non-zero value in a 3D numpy array using a vectorized method

Question:

My problem is in 3D, but I will explain it in 2D. Suppose I have a numpy array that is mostly zeros but a few ones. (For my real problem, I have about a 1000 x 1000 x 1000 grid with about 10 to 1000 ones.)

positions = np.array([[0, 0, 0, 0, 0],
                      [0, 0, 1, 0, 0],
                      [0, 0, 0, 0, 0],
                      [0, 0, 0, 0, 1],
                      [0, 0, 0, 0, 0]])

I also have two other arrays (when in 2D) that tell me how I want to move the ones. They might look like this:

shift_x = np.array([[0, 0,  0, 0, 0],
                    [0, 0, -1, 0, 0],
                    [0, 0,  0, 0, 0],
                    [0, 0,  0, 0, 1],
                    [0, 0,  0, 0, 0]])

shift_y = np.array([[0, 0, 0, 0,  0],
                    [0, 0, 1, 0,  0],
                    [0, 0, 0, 0,  0],
                    [0, 0, 0, 0, -1],
                    [0, 0, 0, 0,  0]])

I then want to apply shift_x and shift_y to the positions array. For example, the "1" in the second row of positions would move left (-1) and up (1), while the "1" in the fourth row would move right (wrapping around) and down. The new array would look like this:

new_positions = np.array([[0, 1, 0, 0, 0],
                          [0, 0, 0, 0, 0],
                          [0, 0, 0, 0, 0],
                          [0, 0, 0, 0, 0],
                          [1, 0, 0, 0, 0]])

I realize I can write this as a for loop and move each item one at a time. That is too slow for my needs. I want a vectorized approach that utilizes numpy’s capabilities. Can someone suggest a set of vectorized operations that would accomplish this task?

I’ve written this previously as a for loop but, as mentioned, this is too slow when working with large 3D arrays. This is done by shifting the index of the "one"s according to the shift_x and shift_y arrays.

Asked By: Scott

||

Answers:

With arrays so big and sparse, have you considered using a data type different from NumPy arrays? Using plain dicts instead would allow you to only store the 1s and not the 0s, making the whole thing lighter and faster.

Even if sticking to NumPy arrays, achieving your desired operation through NumPy will most probably come with a huge memory waste. I would suggest trying a Numba solution instead, which works off of simple loop code.

Below I have a simple loop implementation, with and without Numba:

from time import time
import numpy as np

positions = np.array([[0, 0, 0, 0, 0],
                      [0, 0, 1, 0, 0],
                      [0, 0, 0, 0, 0],
                      [0, 0, 0, 0, 1],
                      [0, 0, 0, 0, 0]])

shift_x = np.array([[0, 0,  0, 0, 0],
                    [0, 0, -1, 0, 0],
                    [0, 0,  0, 0, 0],
                    [0, 0,  0, 0, 1],
                    [0, 0,  0, 0, 0]])

shift_y = np.array([[0, 0, 0, 0,  0],
                    [0, 0, 1, 0,  0],
                    [0, 0, 0, 0,  0],
                    [0, 0, 0, 0, -1],
                    [0, 0, 0, 0,  0]])

def shift(positions, shift_x, shift_y):
    m = positions.shape[0]
    n = positions.shape[1]
    for i in range(m):
        for j in range(n):
            if positions[i, j] == 0:
                continue
            positions[i, j] = 0
            dx = shift_x[i, j]
            dy = shift_y[i, j]
            positions[(i - dy)%m, (j + dx)%m] = 1

tic = time()
shift(positions, shift_x, shift_y)
toc = time()
print('no Numba:', toc - tic)


######## Numba version below #########


import numba

@numba.jit
def shift(positions, shift_x, shift_y):
    m = positions.shape[0]
    n = positions.shape[1]
    for i in range(m):
        for j in range(n):
            if positions[i, j] == 0:
                continue
            positions[i, j] = 0
            dx = shift_x[i, j]
            dy = shift_y[i, j]
            positions[(i - dy)%m, (j + dx)%m] = 1

# Initial call, just for compiling
tmp = np.zeros((2, 2), dtype=int)
shift(tmp, tmp, tmp)

# Actual call
tic = time()
shift(positions, shift_x, shift_y)
toc = time()
print('Numba:', toc - tic)

Even for this very small example, Numba provides a measureable speedup. For large arrays, it will be very significant.

Note that I first call the Numba function with "fake data". This first call is slower than future calls due to compilation, so it’s best to provide small (in data size) arguments here, and they should be of the same NumPy dtype as your actual arrays.

Answered By: jmd_dk
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.