Concatenate sparse matrices in Python using SciPy/Numpy

Question:

What would be the most efficient way to concatenate sparse matrices in Python using SciPy/Numpy?

Here I used the following:

>>> np.hstack((X, X2))
array([ <49998x70000 sparse matrix of type '<class 'numpy.float64'>'
        with 1135520 stored elements in Compressed Sparse Row format>,
        <49998x70000 sparse matrix of type '<class 'numpy.int64'>'
        with 1135520 stored elements in Compressed Sparse Row format>], 
       dtype=object)

I would like to use both predictors in a regression, but the current format is obviously not what I’m looking for. Would it be possible to get the following:

    <49998x1400000 sparse matrix of type '<class 'numpy.float64'>'
     with 2271040 stored elements in Compressed Sparse Row format>

It is too large to be converted to a deep format.

Asked By: PascalVKooten

||

Answers:

You can use the scipy.sparse.hstack to concatenate sparse matrices with the same number of rows (horizontal concatenation):

from scipy.sparse import hstack
hstack((X, X2))

Similarly, you can use scipy.sparse.vstack to concatenate sparse matrices with the same number of columns (vertical concatenation).

Using numpy.hstack or numpy.vstack will create an array with two sparse matrix objects.

Answered By: Saullo G. P. Castro
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.