np.linalg.svd is giving me wrong results

Question:

so I am trying to learn how SVD work to use it in PCA (principle component analysis), but the problem is that it seam I get wrong results, I tryied using np.linalg.svd and this is my code :

A = np.array([[2, 2],
              [1, 1]])
u, s, v = np.linalg.svd(A, full_matrices=False)
print(u)
print(s)
print(v)

and this is the result I got :

[[-0.89442719 -0.4472136 ]
 [-0.4472136   0.89442719]]
[3.16227766e+00 1.10062118e-17]
[[-0.70710678 -0.70710678]
 [ 0.70710678 -0.70710678]]

and I tried to get SVD decomposition on WolframAlpha and I got these results:

enter image description here

the magnitude of values seems correct but the sign is not correct, even I followed up with a video for a professor on MIT OpenCourseWare on youtube and he give these results :

enter image description here

which is the same magnitude answer but different signs, so what could possibly had gone wrong?

Asked By: abdo Salm

||

Answers:

Different conventions

It is a matter of a different convention for returning the matrix v:

From the documentation of numpy.linalg.svd (emphasis mine):

linalg.svd(a, full_matrices=True, compute_uv=True, hermitian=False)

Singular Value Decomposition.

When a is a 2D array, and full_matrices=False, then it is factorized as u @ np.diag(s) @ vh = (u * s) @ vh, where u and the Hermitian transpose of vh are 2D arrays with orthonormal columns and s is a 1D array of a’s singular values. When a is higher-dimensional, SVD is applied in stacked mode as explained below.

Returns: u{ (…, M, M), (…, M, K) } array Unitary array(s). The first
a.ndim - 2 dimensions have the same size as those of the input a. The
size of the last two dimensions depends on the value of full_matrices.
Only returned when compute_uv is True.

s(…, K) array Vector(s) with the singular values, within each vector
sorted in descending order. The first a.ndim - 2 dimensions have the
same size as those of the input a.

vh{ (…, N, N), (…, K, N) } array Unitary array(s). The first a.ndim
2 dimensions have the same size as those of the input a. The size of
the last two dimensions depends on the value of full_matrices. Only
returned when compute_uv is True.

So to summarize: given the SVD decomposition of x, x = u @ np.diag(s) @ vh the matrices returned by numpy.linalg.svd(x) are u, s and vh where vh is the hermitian conjugate of v. Other libraries and software will instead return v, causing the apparent inconsistency.

It is a shame that different libraries have different conventions, this also really tripped me up the first time I had to deal with this.

Intrinsic mathematical ambiguity on u and v

Furthermore the mathematics of the problem means that the matrices u and v are not uniquely determined.

In order to check that the SVD is correct you need to check that the matrices u and v are indeed unitary and that x = u @ np.diag(s) @ vh. If both conditions hold, than the SVD is correct, otherwise it isn’t.

Testing the numpy library

Here is some simple code to check that the implementation of the SVD library in numpy is indeed correct (of course it is, consider this a pedagogical exercise):

import numpy as np

A = np.array([[2, 2],
              [1, 1]])
u, s, vh = np.linalg.svd(A, full_matrices=False)

smat = np.diag(s)

def is_unary(A):
    return np.allclose(A @ A.T, np.eye(A.shape[0]))

print(f"Is u unary? {is_unary(u)}")
print(f"Is vh unary? {is_unary(vh)}")
print(f"original matrix A")
print(A)
print("Reconstructed matrix A")
print(u @ smat @ vh)

def check_svd(A, u, s, vh):
    smat = np.diag(s)
    c1 = is_unary(u)
    c2 = is_unary(vh)
    c3 = np.allclose(A, u @ smat @ vh)
    success = c1 and c2 and c3
    if not success:
        raise Exception("numpy SVD failed")

for i in range(100):
    A = np.random.rand(100, 100)
    u, s, vh = np.linalg.svd(A, full_matrices=False)
    check_svd(A, u, s, vh)

print("All tests passed")
Answered By: Caridorc