How can I give row and column names to Scipy's csr_matrix?
Question:
I don’t know if it’s possible, and it’s possibly a naive question, but how can I set the equivalent of R’s rownames()
and colnames()
to a scipy.sparse.csr.csr_matrix
?
I saw that my_matrix.dtype.names
doesn’t work here, and I can’t find any “index” equivalend for such sparse matrix…
Moreover, pandas.sparse.*
is not an option here, because of some open issue…
Thank you so much for your help,
Answers:
You’ll have to maintain the names separately, as none of scipy’s sparse formats support named indexing. This might look like:
foo = csr_matrix(...)
row_names = np.array(...)
col_names = np.array(...)
# index by name:
row_idx, = np.where(row_names == "my row")
col_idx, = np.where(col_names == "my col")
foo[row_idx, col_idx]
See the package "SSparseMatrix". (The package uses SciPy’s sparse matrices.)
Here is a creation and row-selection example (in a Python session):
>>> from SSparseMatrix import *
>>> mat = [[1, 0, 0, 3], [4, 0, 0, 5], [0, 3, 0, 5], [0, 0, 1, 0], [0, 0, 0, 5]]
>>> smat = SSparseMatrix(mat)
>>> smat.set_column_names(["a", "b", "c", "d"])
<5x4 SSparseMatrix (sparse matrix with named rows and columns) of type '<class 'numpy.int64'>'
with 8 stored elements in Compressed Sparse Row format, and fill-in 0.4>
>>> smat.set_row_names(["A", "B", "C", "D", "E"])
<5x4 SSparseMatrix (sparse matrix with named rows and columns) of type '<class 'numpy.int64'>'
with 8 stored elements in Compressed Sparse Row format, and fill-in 0.4>
>>> smat.print_matrix()
===================================
| a b c d
-----------------------------------
A | 1 . . 3
B | 4 . . 5
C | . 3 . 5
D | . . 1 .
E | . . . 5
===================================
>>> smat[["A","B"],:].print_matrix()
===================================
| a b c d
-----------------------------------
A | 1 . . 3
B | 4 . . 5
===================================
I don’t know if it’s possible, and it’s possibly a naive question, but how can I set the equivalent of R’s rownames()
and colnames()
to a scipy.sparse.csr.csr_matrix
?
I saw that my_matrix.dtype.names
doesn’t work here, and I can’t find any “index” equivalend for such sparse matrix…
Moreover, pandas.sparse.*
is not an option here, because of some open issue…
Thank you so much for your help,
You’ll have to maintain the names separately, as none of scipy’s sparse formats support named indexing. This might look like:
foo = csr_matrix(...)
row_names = np.array(...)
col_names = np.array(...)
# index by name:
row_idx, = np.where(row_names == "my row")
col_idx, = np.where(col_names == "my col")
foo[row_idx, col_idx]
See the package "SSparseMatrix". (The package uses SciPy’s sparse matrices.)
Here is a creation and row-selection example (in a Python session):
>>> from SSparseMatrix import *
>>> mat = [[1, 0, 0, 3], [4, 0, 0, 5], [0, 3, 0, 5], [0, 0, 1, 0], [0, 0, 0, 5]]
>>> smat = SSparseMatrix(mat)
>>> smat.set_column_names(["a", "b", "c", "d"])
<5x4 SSparseMatrix (sparse matrix with named rows and columns) of type '<class 'numpy.int64'>'
with 8 stored elements in Compressed Sparse Row format, and fill-in 0.4>
>>> smat.set_row_names(["A", "B", "C", "D", "E"])
<5x4 SSparseMatrix (sparse matrix with named rows and columns) of type '<class 'numpy.int64'>'
with 8 stored elements in Compressed Sparse Row format, and fill-in 0.4>
>>> smat.print_matrix()
===================================
| a b c d
-----------------------------------
A | 1 . . 3
B | 4 . . 5
C | . 3 . 5
D | . . 1 .
E | . . . 5
===================================
>>> smat[["A","B"],:].print_matrix()
===================================
| a b c d
-----------------------------------
A | 1 . . 3
B | 4 . . 5
===================================