How to delete small elements in sparse matrix in Python's SciPy?
Question:
I have a question that is quite similiar to Sean Laws example that you can find here:
https://seanlaw.github.io/2019/02/27/set-values-in-sparse-matrix/
In my case, I want to delete all the elements in a sparse csr matrix, which have an absolute value smaller than some epsilon.
First I tried something like
x[abs(x) < 3] = 0
but SciPy’s warning about inefficiency lead me to Sean Laws explanation in the link above. I then tried manipulating his example code, but cannot find a solution to my problem.
Here is the code, with some negative entries added. The example code would remove all negative entries as they are smaller than 3. I tried around with np.abs() and also with adding a second logical operator but did not succeed up to now.
import numpy as np
from scipy.sparse import csr_matrix
x = csr_matrix(np.array([[1, 0.1, -2, 0, 3],
[0, -4, -1, 5, 0]]))
nonzero_mask = np.array(x[x.nonzero()] < 3)[0]
rows = x.nonzero()[0][nonzero_mask]
cols = x.nonzero()[1][nonzero_mask]
x[rows, cols] = 0
print(x.todense())
gives
[[0. 0. 0. 0. 3.]
[0. 0. 0. 5. 0.]]
But what I want is
[[0. 0. 0. 0. 3.]
[0. -4. 0. 5. 0.]]
Any help is greatly appreciated, I feel like I am missing something very basic.
Thank you in advance!
Answers:
wrapping x[x.nonzero()]
into np.abs()
solves the problem:
>>> nonzero_mask = np.array(np.abs(x[x.nonzero()]) < 3)[0]
...
>>> print(x.todense())
[[ 0. 0. 0. 0. 3.]
[ 0. -4. 0. 5. 0.]]
In [286]: from scipy import sparse
In [287]: x = sparse.csr_matrix(np.array([[1, 0.1, -2, 0, 3],
...: [0, -4, -1, 5, 0]]))
...:
...:
Your test on x
selects the 0 values as well, hence the efficiency warning. But applied to just the nonzero values in the data
attribute:
In [288]: x.data
Out[288]: array([ 1. , 0.1, -2. , 3. , -4. , -1. , 5. ])
In [289]: mask = np.abs(x.data)<3
In [290]: mask
Out[290]: array([ True, True, True, False, False, True, False])
In [291]: x.data[mask]=0
In [292]: x.A
Out[292]:
array([[ 0., 0., 0., 0., 3.],
[ 0., -4., 0., 5., 0.]])
This doesn’t actually remove the elements from the matrix, but there is a method for that cleanup:
In [293]: x
Out[293]:
<2x5 sparse matrix of type '<class 'numpy.float64'>'
with 7 stored elements in Compressed Sparse Row format>
In [294]: x.eliminate_zeros()
In [295]: x
Out[295]:
<2x5 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
I have a question that is quite similiar to Sean Laws example that you can find here:
https://seanlaw.github.io/2019/02/27/set-values-in-sparse-matrix/
In my case, I want to delete all the elements in a sparse csr matrix, which have an absolute value smaller than some epsilon.
First I tried something like
x[abs(x) < 3] = 0
but SciPy’s warning about inefficiency lead me to Sean Laws explanation in the link above. I then tried manipulating his example code, but cannot find a solution to my problem.
Here is the code, with some negative entries added. The example code would remove all negative entries as they are smaller than 3. I tried around with np.abs() and also with adding a second logical operator but did not succeed up to now.
import numpy as np
from scipy.sparse import csr_matrix
x = csr_matrix(np.array([[1, 0.1, -2, 0, 3],
[0, -4, -1, 5, 0]]))
nonzero_mask = np.array(x[x.nonzero()] < 3)[0]
rows = x.nonzero()[0][nonzero_mask]
cols = x.nonzero()[1][nonzero_mask]
x[rows, cols] = 0
print(x.todense())
gives
[[0. 0. 0. 0. 3.]
[0. 0. 0. 5. 0.]]
But what I want is
[[0. 0. 0. 0. 3.]
[0. -4. 0. 5. 0.]]
Any help is greatly appreciated, I feel like I am missing something very basic.
Thank you in advance!
wrapping x[x.nonzero()]
into np.abs()
solves the problem:
>>> nonzero_mask = np.array(np.abs(x[x.nonzero()]) < 3)[0]
...
>>> print(x.todense())
[[ 0. 0. 0. 0. 3.]
[ 0. -4. 0. 5. 0.]]
In [286]: from scipy import sparse
In [287]: x = sparse.csr_matrix(np.array([[1, 0.1, -2, 0, 3],
...: [0, -4, -1, 5, 0]]))
...:
...:
Your test on x
selects the 0 values as well, hence the efficiency warning. But applied to just the nonzero values in the data
attribute:
In [288]: x.data
Out[288]: array([ 1. , 0.1, -2. , 3. , -4. , -1. , 5. ])
In [289]: mask = np.abs(x.data)<3
In [290]: mask
Out[290]: array([ True, True, True, False, False, True, False])
In [291]: x.data[mask]=0
In [292]: x.A
Out[292]:
array([[ 0., 0., 0., 0., 3.],
[ 0., -4., 0., 5., 0.]])
This doesn’t actually remove the elements from the matrix, but there is a method for that cleanup:
In [293]: x
Out[293]:
<2x5 sparse matrix of type '<class 'numpy.float64'>'
with 7 stored elements in Compressed Sparse Row format>
In [294]: x.eliminate_zeros()
In [295]: x
Out[295]:
<2x5 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>