easy sampling of vectors from a sparse matrix, and creating a new matrix from the sample (python)
Question:
This question has two parts (maybe one solution?):
Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix?
When I’m trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.
from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K) #works OK
mm = np.array(m)
sample(m,K) #works OK
sm = lil_matrix(m)
sample(sm,K) #throws exception TypeError: sparse matrix length is ambiguous.
My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:
indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
sampledRows+=[sm.getrow(i)]
Any other efficient/elegant ideas? the dense matrix size is 1000×30000 and could be larger.
Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?
Answers:
Try
sm[np.random.sample(sm.shape[0], K, replace=False), :]
This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample
). I’m not sure it’s super-fast, but it can’t really be worse than manually accessing row by row like you’re currently doing, and probably preallocates the results.
The accepted answer to this question is outdated and no longer works. With newer versions of numpy
, you should use np.random.choice
in place of np.random.sample
, e.g.:
sm[np.random.choice(sm.shape[0], K, replace=False), :]
as opposed to:
sm[np.random.sample(sm.shape[0], K, replace=False), :]
This question has two parts (maybe one solution?):
Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix?
When I’m trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.
from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K) #works OK
mm = np.array(m)
sample(m,K) #works OK
sm = lil_matrix(m)
sample(sm,K) #throws exception TypeError: sparse matrix length is ambiguous.
My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:
indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
sampledRows+=[sm.getrow(i)]
Any other efficient/elegant ideas? the dense matrix size is 1000×30000 and could be larger.
Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?
Try
sm[np.random.sample(sm.shape[0], K, replace=False), :]
This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample
). I’m not sure it’s super-fast, but it can’t really be worse than manually accessing row by row like you’re currently doing, and probably preallocates the results.
The accepted answer to this question is outdated and no longer works. With newer versions of numpy
, you should use np.random.choice
in place of np.random.sample
, e.g.:
sm[np.random.choice(sm.shape[0], K, replace=False), :]
as opposed to:
sm[np.random.sample(sm.shape[0], K, replace=False), :]