What is the best approach to compute the trace of a (sparse) matrix product efficiently in python

Question:

I’m trying to take the hilbert schmidt inner product of two matices.

For two matrices A, B this operation requires the matrix product of the Hermitian conjugate of A times B followed by the trace (summing down the diagonal). As you only need the diagonal entries, it is pointless to find the full matrix product, as the off diagonal terms are not needed.

In effect one needs to compute:

( † )=∑_{ } ( _ * _ )

where i indexes rows and j indexes columns.

What is the fastest way to do this for sparse matrices? I found the following similar article:

What is the best way to compute the trace of a matrix product in numpy?

Currently, I am doing:

def hilbert_schmidt_inner_product(mat1, mat2):
    
    ## find nonzero ij indices of each matrix
    mat1_ij = set([tuple(x) for x in np.array(list(zip(*mat1.nonzero())))])
    mat2_ij = set([tuple(x) for x in np.array(list(zip(*mat2.nonzero())))])
    
    ## find common ij indices between these that are both nonzero
    common_ij = np.array(list(mat1_ij & mat2_ij))
    
    ## select common i,j indices from both (now will be 1D array)
    mat1_survied = np.array(mat1[common_ij[:,0], common_ij[:,1]])[0]
    mat2_survied = np.array(mat2[common_ij[:,0], common_ij[:,1]])[0]
    
    ## multiply the nonzero ij common elements (1D dot product!)
    trace = np.dot(mat1_survied.conj(),mat2_survied)
    return trace

However this is slower than:

import numpy as np
sum((mat1.conj().T@mat2).diagonal())

which does the full matix product before taking the trace and thus does pointless operations to find off-diagonal elements. Is there a better way of doing this?

I am using the following to benchmark:

import numpy as np
from scipy.sparse import rand

Dimension = 2**12

A = rand(Dimension, Dimension, density=0.001, format='csr')
B = rand(Dimension, Dimension, density=0.001, format='csr')

running a few tests, I find:

%timeit hilbert_schmidt_inner_product(A,B)
49.2 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum((A.conj().T@B).diagonal())
1.48 ms ± 32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.einsum('ij,ij->', A.conj().todense(), B.todense())
53.9 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Asked By: lex2763

||

Answers:

Another option is (A.conj().multiply(B)).sum().

In [111]: Dimension = 2**12

In [112]: A = rand(Dimension, Dimension, density=0.001, format='csr')
     ...: B = rand(Dimension, Dimension, density=0.001, format='csr')

Compare to sum((A.conj().T @ B).diagonal()):

In [113]: sum((A.conj().T @ B).diagonal())
Out[113]: 4.152218112255467

In [114]: (A.conj().multiply(B)).sum()
Out[114]: 4.152218112255466

In [115]: %timeit sum((A.conj().T @ B).diagonal())
2.7 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [116]: %timeit (A.conj().multiply(B)).sum()
1.12 ms ± 4.39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Of course, for larger values of Dimension, the relative performance difference is much greater (O(Dimension**3) for the full matrix multiply vs O(Dimension**2) for the elementwise multiply):

In [119]: Dimension = 2**14

In [120]: A = rand(Dimension, Dimension, density=0.001, format='csr')
     ...: B = rand(Dimension, Dimension, density=0.001, format='csr')

In [121]: sum((A.conj().T @ B).diagonal())
Out[121]: 69.23254213582365

In [122]: (A.conj().multiply(B)).sum()
Out[122]: 69.23254213582364

In [123]: %timeit sum((A.conj().T @ B).diagonal())
124 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [124]: %timeit (A.conj().multiply(B)).sum()
8.67 ms ± 63.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Answered By: Warren Weckesser