easy sampling of vectors from a sparse matrix, and creating a new matrix from the sample (python)

Question:

This question has two parts (maybe one solution?):

Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix?
When I’m trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.

from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K)    #works OK
mm = np.array(m)
sample(m,K)    #works OK
sm = lil_matrix(m)
sample(sm,K)   #throws exception TypeError: sparse matrix length is ambiguous.

My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:

indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
    sampledRows+=[sm.getrow(i)]

Any other efficient/elegant ideas? the dense matrix size is 1000×30000 and could be larger.

Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?

Asked By: ScienceFriction

||

Answers:

Try

sm[np.random.sample(sm.shape[0], K, replace=False), :]

This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample). I’m not sure it’s super-fast, but it can’t really be worse than manually accessing row by row like you’re currently doing, and probably preallocates the results.

Answered By: Danica

The accepted answer to this question is outdated and no longer works. With newer versions of numpy, you should use np.random.choice in place of np.random.sample, e.g.:

sm[np.random.choice(sm.shape[0], K, replace=False), :]

as opposed to:

sm[np.random.sample(sm.shape[0], K, replace=False), :]
Answered By: primaj