Multiply a (N,N) matrix by a (N,M,O) matrix along the O dimension with Numba

Question:

I’m trying to multiply a matrix A of size $(N,N)$ by a matrix B of size $(N,M,O)$ matrix along the O dimension (that is, left-multiply all the "pages" of B along the O dimension by A), using a jitted numba function.

I have come up with this solution:

@njit
def fast_expectation(Pi, X):
    
    res = np.empty_like(X)
    
    for i in range(Pi.shape[0]):
        for j in range(X.shape[1]):
            for k in range(X.shape[2]):
                res[i,j,k] = np.dot(Pi[i,:], X[:,j,k])
                            
    return res 

However, this returns a warning NumbaPerformanceWarning: np.dot() is faster on contiguous arrays, called on (array(float64, 1d, C), array(float64, 1d, A)). Do you know how I could perform this in a fast way, with a numba compatible function?

I tried running the previous code, and swapping the arrays of the matrix B (turn it into a (N,M,O) matrix). Didn’t work.

Edit:

I also tried the following code:

@njit
def multiply_ith_dimension(Pi, i, X):
    """If Pi is a matrix, multiply Pi times the ith dimension of X and return"""
    X = np.swapaxes(X, 0, i)
    shape = X.shape
    X = X.reshape(shape[0], -1)

    # iterate forward using Pi
    X = Pi @ X

    # reverse steps
    X = X.reshape(Pi.shape[0], *shape[1:])
    return np.swapaxes(X, 0, i)

which also gives me an error

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
reshape() supports contiguous array only
...
    <source elided>
    shape = X.shape
    X = X.reshape(shape[0], -1)
    ^
Asked By: Mr. Fafa

||

Answers:

You could easily do this with np.einsum (or even faster, the opt_einsum library).

result = np.einsum('ab,bcd->acd', A, B, optimize=True)

Note, einsum uses np.dot under the hood (which is based on BLAS), so is probably the fastest possible approach.

Answered By: wizzzz1

However, this returns a warning NumbaPerformanceWarning: np.dot() is faster on contiguous arrays, called on (array(float64, 1d, C), array(float64, 1d, A))

This is because you access a row-major Numpy array using X[:,j,k]. The last dimension is contiguous but not the first. You can fix that using a transposition. That being said, the Numpy transposition does not create a new array. Instead, it creates a view with strides. You can force a contiguous array to be created using np.ascontiguousarray. Alternatively, you can just explicitly copy the array. For example: arr.T.copy().

An alternative solution is simply not to use np.dot but plain loops instead. Loops are generally very efficient in Numba while calling Numpy function can introduce some overheads (mainly due to allocations, temporary arrays or implicit casting).

which also gives me an error [...] reshape() supports contiguous array only

AFAIK, this is a current limitation of Numba. Numpy can do that correctly (at least I got no issue on my machine with Numpy 1.22.4). This is certainly because Numba want the result to be always a view while Numpy can actually return a copy in some cases (typically like this one).


Note your first Numba code is inefficient because it is sequential and naive (no tiling, unrolling or anything like this). A @ B is generally very efficient because it makes use of a BLAS library. BLAS like OpenBLAS (default implementation on most platform) or the Intel MKL libraries are highly optimised since decades by expert of the domain. They use multiple thread if possible and use a much more complex optimized code (typically using SIMD instruction manually in C).

Answered By: Jérôme Richard