R internal handling of sparse matrices

Question:

I have been comparing the performance of several PCA implementations from both Python and R, and noticed an interesting behavior:
While it seems impossible to compute the PCA of a sparse matrix in Python (the only approach would be scikit-learn’s TruncatedSVD, yet it does not support the mean-centering required to be equivalent to a covariance solution for PCA.
Their argumentation is, that it would destroy the sparsity property of the matrix. Other implementations like Facebook’s PCA algorithm or the PCA/randomPCA method in scikit learn do not support sparse matrices for similar reasons.

While all of that makes sense to me, several R packages, like irlba, rsvd, etc., are able to handle sparse matrices (e.g. generated with rsparsematrix), and even allow for specific center=True arguments.

My question is, how R handles this internally, as it seems to be vastly more efficient than the comparable Python implementation.
Does R still maintain the sparsity by doing Absolute Scaling instead (which would theoretically falsify the results, but at least maintain sparsity)?
Or is there any way in which the mean can be stored explicitly for the zero values, and is only stored once (instead of for every value separately)?

To get put off hold:
How does R internally store matrices with mean-centering without exploding RAM usage.
Hope that is concise enough….

Asked By: dennlinger

||

Answers:

The key here is that the underlying implementation for the partial SVD (restarted Lanczos bidiagonalization C code) doesn’t store the matrix. You instead record the result of the linear operation from the matrix
applied to a small set of vectors obtained from the previous iteration.

Rather than explaining the concrete method used in the c code, which is quite advanced (see paper for description),I will explain it with a much simpler algorithm that captures the key idea in terms of how to preserve the efficiency from sparsity: the power method (or the subspace iteration method for its generalization to multiple eigenvalues). The algorithm returns the largest eigenvalue of a matrix A by iteratively applying a linear operator, then normalizing (or orthogonalizing a small set of vectors, in the case of subspace iteration)

What you do at every iteration is

v=A*v
v=v/norm(v)

The matrix multiplication step is the crucial one, so let’s see what happens when we try the same thing with a centered A. The matrix formula for centered A (with center as the vector with the mean column values and ones as the vector of ones) is:

A_center=A-ones*transpose(center)

So if we apply the iterative algorithm to this new matrix we would get

v=A*v-dotproduct(center,v)*ones

Since A was sparse we can use the sparse matrix-vector product on (A,v) and -dotproduct(center,v)*ones just entails subtracting the dot product of center and v from the resulting vector which is linear on the dimension of A.

Answered By: Juan Carlos Ramirez