How to find zero elements in a sparse matrix

Question:

I know that scipy.sparse.find(A) returns 3 arrays I,J,V each of them containing the rows, columns, and values of the nonzero elements respectively.

What i want is a way to do the same (except the V array) for all zero elements without having to iterate through the matrix since its too large.

Asked By: milouk

||

Answers:

Assuming you have a scipy sparse array and have imported find:

from itertools import product
I, J, _= find(your_sparse_array)
nonzero = zip(I, J)
nrows, ncols = your_sparse_array.shape
for a, b in product(range(nrows), range(ncols)):
    if (a,b) not in nonzero: print(a, b)
Answered By: Alex Alifimoff

Make a small sparse matrix with 10% sparsity:

In [1]: from scipy import sparse
In [2]: M = sparse.random(10,10,.1)
In [3]: M
Out[3]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 10 stored elements in COOrdinate format>

The 10 nonzero values:

In [5]: sparse.find(M)
Out[5]: 
(array([6, 4, 1, 2, 3, 0, 1, 6, 9, 6], dtype=int32),
 array([1, 2, 3, 3, 3, 4, 4, 4, 5, 8], dtype=int32),
 array([ 0.91828586,  0.29763717,  0.12771201,  0.24986069,  0.14674883,
         0.56018409,  0.28643427,  0.11654358,  0.8784731 ,  0.13253971]))

If, out of the 100 elements of the matrix, 10 are nonzero, then 90 elements are zero. Do you really want the indices of all of those?

where or nonzero on the dense equivalent gives the same indices:

In [6]: A = M.A # dense
In [7]: np.where(A)
Out[7]: 
(array([0, 1, 1, 2, 3, 4, 6, 6, 6, 9], dtype=int32),
 array([4, 3, 4, 3, 3, 2, 1, 4, 8, 5], dtype=int32))

And the indices of the 90 zero values:

In [8]: np.where(A==0)
Out[8]: 
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,
        5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7,
        7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9], dtype=int32),
 array([0, 1, 2, 3, 5, 6, 7, 8, 9, 0, 1, 2, 5, 6, 7, 8, 9, 0, 1, 2, 4, 5, 6,
        7, 8, 9, 0, 1, 2, 4, 5, 6, 7, 8, 9, 0, 1, 3, 4, 5, 6, 7, 8, 9, 0, 1,
        2, 3, 4, 5, 6, 7, 8, 9, 0, 2, 3, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7,
        8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 6, 7, 8, 9], dtype=int32))

That’s 2 arrays of shape (90,), 180 integers, as opposed to the 100 values in the the dense array itself. If your sparse matrix is too large to convert to dense, it will be too large to produce all the zero indices (assuming reasonable sparsity).

The print(M) shows the same triplets as the find. The attributes of the coo format also give the nonzero indices:

In [13]: M.row
Out[13]: array([6, 6, 3, 4, 1, 6, 9, 2, 1, 0], dtype=int32)
In [14]: M.col
Out[14]: array([1, 4, 3, 2, 3, 8, 5, 3, 4, 4], dtype=int32)

(Sometimes manipulation of a matrix can set values to 0 without removing them from the attributes. So find/nonzero takes an added step to remove those, if any.)


We could apply find to M==0 as well – but sparse will give us a warning.

In [15]: sparse.find(M==0)
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:213: SparseEfficiencyWarning: Comparing a sparse matrix with 0 using == is inefficient, try using != instead.
  ", try using != instead.", SparseEfficiencyWarning)

It’s the same thing that I’ve been warning about – the large size of this set. The resulting arrays are the same as in Out[8].

Answered By: hpaulj

Here is my solution to find the indices for the zero values:

from scipy.sparse import csr_matrix
csrm_reversed=sparse.csr_matrix((csrm.A==0)*1)
csrm_reversed.nonzero()

For example:

from scipy.sparse import csr_matrix
csrm = csr_matrix([[1,2,0],[0,0,3],[4,0,5]])
csrm.nonzero()

you will get the nonzero indices:

(array([0, 0, 1, 2, 2], dtype=int32), array([0, 1, 2, 0, 2], dtype=int32))

and then to find the zero indices:

csrm_reversed=sparse.csr_matrix((csrm.A==0)*1)
csrm_reversed.nonzero()

you will get:

(array([0, 1, 1, 2], dtype=int32), array([2, 0, 1, 1], dtype=int32))

The dense format of the matrix is:

[[1, 2, 0],
[0, 0, 3],
[4, 0, 5]]
Answered By: Minstein