Get U, Sigma, V* matrix from Truncated SVD in scikit-learn
Question:
I am using truncated SVD from scikit-learn
package.
In the definition of SVD, an original matrix A is approxmated as a product A ≈ UΣV* where U and V have orthonormal columns, and Σ is non-negative diagonal.
I need to get the U, Σ and V* matrices.
Looking at the source code here I found out that V* is stored in self.components_
field after calling fit_transform
.
Is it possible to get U and Σ matrices?
My code:
import sklearn.decomposition as skd
import numpy as np
matrix = np.random.random((20,20))
trsvd = skd.TruncatedSVD(n_components=15)
transformed = trsvd.fit_transform(matrix)
VT = trsvd.components_
Answers:
One can use scipy.sparse.svds (for dense matrices you can use svd).
import numpy as np
from scipy.sparse.linalg import svds
matrix = np.random.random((20, 20))
num_components = 2
u, s, v = svds(matrix, k=num_components)
X = u.dot(np.diag(s)) # output of TruncatedSVD
If you’re working with really big sparse matrices (perhaps your working with natural text), even scipy.sparse.svds
might blow up your computer’s RAM. In such cases, consider the sparsesvd package which uses SVDLIBC, and what gensim
uses under-the-hood.
import numpy as np
from sparsesvd import sparsesvd
X = np.random.random((30, 30))
ut, s, vt = sparsesvd(X.tocsc(), k)
projected = (X * ut.T)/s
Looking into the source via the link you provided, TruncatedSVD
is basically a wrapper around sklearn.utils.extmath.randomized_svd; you can manually call this yourself like this:
from sklearn.utils.extmath import randomized_svd
U, Sigma, VT = randomized_svd(X,
n_components=15,
n_iter=5,
random_state=None)
Let us suppose X is our input matrix on which we want yo perform Truncated SVD.
Below commands helps to find out the U, Sigma and VT :
from sklearn.decomposition import TruncatedSVD
SVD = TruncatedSVD(n_components=r)
U = SVD.fit_transform(X)
Sigma = SVD.explained_variance_ratio_
VT = SVD.components_
#r corresponds to the rank of the matrix
To understand the above terms, please refer to http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
Just as a note:
svd.transform(X)
and
svd.fit_transform(X)
generate U * Sigma.
svd.singular_values_
generates Sigma in vector form.
svd.components_
generates VT.
Maybe we can use
svd.transform(X).dot(np.linalg.inv(np.diag(svd.singular_values_)))
to get U because U * Sigma * Sigma ^ -1 = U * I = U.
From the source code, we can see X_transformed
which is U * Sigma
(Here Sigma
is a vector) is returned
from the fit_transform
method. So we can get
svd = TruncatedSVD(k)
X_transformed = svd.fit_transform(X)
U = X_transformed / svd.singular_values_
Sigma_matrix = np.diag(svd.singular_values_)
VT = svd.components_
Remark
Truncated SVD is an approximation. X ≈ X’ = UΣV*. We have X’V = UΣ. But what about XV? An interesting fact is XV = X’V. This can be proved by comparing the full SVD form of X and the truncated SVD form of X’. Note XV is just transform(X)
, so we can also get U
by
U = svd.transform(X) / svd.singular_values_
I know this is an older question but the correct version is-
U = svd.fit_transform(X)
Sigma = svd.singular_values_
VT = svd.components_
However, one thing to keep in mind is that U and VT are truncated hence without the rest of the values it not possible to recreate X.
If your matrices are not large, since numpy computes SVD by sorting singular values in order, this can be computed directly with np.linalg.svd
simply by taking the first k singular values from Σ, first k columns of U, and first k rows of Vh. (Use full_matrices=False
to get thin SVD if one of your dimensions is huge.)
m = np.random.random((5,5))
u, s, vh = np.linalg.svd(m)
u2, s2, vh2 = u[:,:2], s[:2], vh[:2,:]
m2 = u2 @ np.diag(s2) @ vh2 # rank-2 approx
If your matrices are large, then the randomized algorithms provided by sklearn.decomposition.TruncatedSVD
will compute truncated SVD more efficiently.
I am using truncated SVD from scikit-learn
package.
In the definition of SVD, an original matrix A is approxmated as a product A ≈ UΣV* where U and V have orthonormal columns, and Σ is non-negative diagonal.
I need to get the U, Σ and V* matrices.
Looking at the source code here I found out that V* is stored in self.components_
field after calling fit_transform
.
Is it possible to get U and Σ matrices?
My code:
import sklearn.decomposition as skd
import numpy as np
matrix = np.random.random((20,20))
trsvd = skd.TruncatedSVD(n_components=15)
transformed = trsvd.fit_transform(matrix)
VT = trsvd.components_
One can use scipy.sparse.svds (for dense matrices you can use svd).
import numpy as np
from scipy.sparse.linalg import svds
matrix = np.random.random((20, 20))
num_components = 2
u, s, v = svds(matrix, k=num_components)
X = u.dot(np.diag(s)) # output of TruncatedSVD
If you’re working with really big sparse matrices (perhaps your working with natural text), even scipy.sparse.svds
might blow up your computer’s RAM. In such cases, consider the sparsesvd package which uses SVDLIBC, and what gensim
uses under-the-hood.
import numpy as np
from sparsesvd import sparsesvd
X = np.random.random((30, 30))
ut, s, vt = sparsesvd(X.tocsc(), k)
projected = (X * ut.T)/s
Looking into the source via the link you provided, TruncatedSVD
is basically a wrapper around sklearn.utils.extmath.randomized_svd; you can manually call this yourself like this:
from sklearn.utils.extmath import randomized_svd
U, Sigma, VT = randomized_svd(X,
n_components=15,
n_iter=5,
random_state=None)
Let us suppose X is our input matrix on which we want yo perform Truncated SVD.
Below commands helps to find out the U, Sigma and VT :
from sklearn.decomposition import TruncatedSVD
SVD = TruncatedSVD(n_components=r)
U = SVD.fit_transform(X)
Sigma = SVD.explained_variance_ratio_
VT = SVD.components_
#r corresponds to the rank of the matrix
To understand the above terms, please refer to http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
Just as a note:
svd.transform(X)
and
svd.fit_transform(X)
generate U * Sigma.
svd.singular_values_
generates Sigma in vector form.
svd.components_
generates VT.
Maybe we can use
svd.transform(X).dot(np.linalg.inv(np.diag(svd.singular_values_)))
to get U because U * Sigma * Sigma ^ -1 = U * I = U.
From the source code, we can see X_transformed
which is U * Sigma
(Here Sigma
is a vector) is returned
from the fit_transform
method. So we can get
svd = TruncatedSVD(k)
X_transformed = svd.fit_transform(X)
U = X_transformed / svd.singular_values_
Sigma_matrix = np.diag(svd.singular_values_)
VT = svd.components_
Remark
Truncated SVD is an approximation. X ≈ X’ = UΣV*. We have X’V = UΣ. But what about XV? An interesting fact is XV = X’V. This can be proved by comparing the full SVD form of X and the truncated SVD form of X’. Note XV is just transform(X)
, so we can also get U
by
U = svd.transform(X) / svd.singular_values_
I know this is an older question but the correct version is-
U = svd.fit_transform(X)
Sigma = svd.singular_values_
VT = svd.components_
However, one thing to keep in mind is that U and VT are truncated hence without the rest of the values it not possible to recreate X.
If your matrices are not large, since numpy computes SVD by sorting singular values in order, this can be computed directly with np.linalg.svd
simply by taking the first k singular values from Σ, first k columns of U, and first k rows of Vh. (Use full_matrices=False
to get thin SVD if one of your dimensions is huge.)
m = np.random.random((5,5))
u, s, vh = np.linalg.svd(m)
u2, s2, vh2 = u[:,:2], s[:2], vh[:2,:]
m2 = u2 @ np.diag(s2) @ vh2 # rank-2 approx
If your matrices are large, then the randomized algorithms provided by sklearn.decomposition.TruncatedSVD
will compute truncated SVD more efficiently.