Multiply scipy sparse matrix with a 3d numpy array

Question:

I have the following matrices

a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))

and I would basically like to implement the following formula

I can calculate the inner difference like this

diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -> (150, 150, 20)

a.shape  # -> (150, 150)

I would basically like to broadcast the element-wise multiplication between the scipy sparse matrix and each internal numpy array.

If A was allowed to be dense, then I could simply do

np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)

but A is sparse, and potentially huge, so this isn’t an option. Of course, I could just reorder the axes and loop over the diff array with a for loop, but is there a faster way using numpy?

As @hpaulj pointed out, the current solution also forms an array of shape (150, 150, 20), which would also immediately lead to problems with memory, so this solution would not be okay either.

Asked By: Pavlin

||

Answers:

Try the following code:

a = np.random.rand(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -> (150, 150, 20)

a.shape  # -> (150, 150)
b = np.einsum("ij,ijk->k", a, (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

For sparse matrix the following code can be used:

import scipy.sparse as sp
import numpy as np
a = sp.random(150, 150)
x = np.random.normal(0, 1, size=(150, 20))
diff = (x[:, None, :] - x[None, :, :]) ** 2
diff.shape  # -> (150, 150, 20)

a.shape  # -> (150, 150)
b = np.einsum("ij,ijk->k", a.toarray(), (x[:, None, :] - x[None, :, :]) ** 2)
result = b.sum()

where a is sparse matrix in COO (Coordinate Format) of shape [150,150]

Answered By: Suraj Kumar
import numpy as np
import scipy.sparse
from numpy.random import default_rng

rand = default_rng(seed=0)

# sigma_k = sum_i^N sum_j^N A_{i,j} (x_{i,k} - x_{j,k})^2

# Dense method
N = 100
x = rand.integers(0, 10, (N, 2))
A = np.clip(rand.integers(0, 100, (N, N)) - 80, a_min=0, a_max=None)
diff = (x[:, None, :] - x[None, :, :])**2
product = np.einsum("ij,ijk->k", A, diff)

# Loop method
s_loop = [0, 0]
for i in range(N):
    for j in range(N):
        for k in range(2):
            s_loop[k] += A[i, j]*(x[i, k] - x[j, k])**2
assert np.allclose(product, s_loop)

# For any i,j, we trivially know whether A_{i,j} is zero, and highly sparse matrices have more zeros
# than nonzeros. Crucially, do not calculate (x_{i,k} - x_{j,k})^2 at all if A_{i,j} is zero.
A_i_nz, A_j_nz = A.nonzero()
diff = (x[A_i_nz, :] - x[A_j_nz, :])**2
s_semidense = A[A_i_nz, A_j_nz].dot(diff)
assert np.allclose(product, s_semidense)

# You can see where this is going:
A_sparse = scipy.sparse.coo_array(A)
diff = (x[A_sparse.row, :] - x[A_sparse.col, :])**2
s_sparse = A_sparse.data.dot(diff)
assert np.allclose(product, s_sparse)

Seems reasonably fast; this completes in about a second:

N = 100_000_000
print(f'Big test: initialising a {N}x{N} array')
n_values = 10_000_000
A = scipy.sparse.coo_array(
    (
        rand.integers(0, 100, n_values),
        rand.integers(0, N, (2, n_values)),
    ),
    shape=(N, N),
)
x = rand.integers(0, 100, (N, 2))

print('Big test: calculating')
s = A.data.dot((x[A.row, :] - x[A.col, :])**2)

print(s)
Answered By: Reinderien