raise LinAlgError("SVD did not converge") LinAlgError: SVD did not converge in matplotlib pca determination
Question:
code :
import numpy
from matplotlib.mlab import PCA
file_name = "store1_pca_matrix.txt"
ori_data = numpy.loadtxt(file_name,dtype='float', comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)
result = PCA(ori_data)
this is my code. though my input matrix is devoid of the nan and inf, i do get the error stated below.
raise LinAlgError("SVD did not converge") LinAlgError: SVD did not converge
what’s the problem?
Answers:
This can happen when there are inf or nan values in the data.
Use this to remove nan values:
ori_data.dropna(inplace=True)
I do not have an answer to this question but I have the reproduction scenario with no
nans and infs. Unfortunately the datataset is pretty large (96MB gzipped).
import numpy as np
from StringIO import StringIO
from scipy import linalg
import urllib2
import gzip
url = 'http://physics.muni.cz/~vazny/gauss/X.gz'
X = np.loadtxt(gzip.GzipFile(fileobj=StringIO(urllib2.urlopen(url).read())), delimiter=',')
linalg.svd(X, full_matrices=False)
which rise:
LinAlgError: SVD did not converge
on:
>>> np.__version__
'1.8.1'
>>> import scipy
>>> scipy.__version__
'0.10.1'
but did not raise an exception on:
>>> np.__version__
'1.8.2'
>>> import scipy
>>> scipy.__version__
'0.14.0'
This may be due to the singular nature of your input datamatrix (which you are feeding to PCA)
I am using numpy 1.11.0. If the matrix has more than 1 eigvalues equal to 0, then ‘SVD did not converge’ is raised.
Even if your data is correct, it may happen because it runs out of memory. In my case, moving from a 32-bit machine to a 64-bit machine with bigger memory solved the problem.
I know this post is old, but in case someone else encounters the same problem. @jseabold was right when he said that the problem is nan or inf and the op was probably right when he said that the data did not have nan’s or inf. However, if one of the columns in ori_data has always the same value, the data will get Nans, since the implementation of PCA in mlab normalizes the input data by doing
ori_data = (ori_data - mean(ori_data)) / std(ori_data).
The solution is to do:
result = PCA(ori_data, standardize=False)
In this way, only the mean will be subtracted without dividing by the standard deviation.
Following on @c-chavez answer, what worked for me was first replacing inf and -inf to nan, then removing nan.
For example:
data = data.replace(np.inf, np.nan).replace(-np.inf, np.nan).dropna()
If there are no inf or NaN values, possibly that is a memory issue. Please try in a machine with higher RAM.
This happened to me when I accidentally resized an image dataset to (0, 64, 3). Try checking the shape of your dataset to see if one of the dimensions is 0.
I had this error multiple times:
- If the length of data is 1. Then it can’t fit anything
- If a value is infinity. You divided by 0 in your processing ?
- If a value is None. This is very common.
code :
import numpy
from matplotlib.mlab import PCA
file_name = "store1_pca_matrix.txt"
ori_data = numpy.loadtxt(file_name,dtype='float', comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)
result = PCA(ori_data)
this is my code. though my input matrix is devoid of the nan and inf, i do get the error stated below.
raise LinAlgError("SVD did not converge") LinAlgError: SVD did not converge
what’s the problem?
This can happen when there are inf or nan values in the data.
Use this to remove nan values:
ori_data.dropna(inplace=True)
I do not have an answer to this question but I have the reproduction scenario with no
nans and infs. Unfortunately the datataset is pretty large (96MB gzipped).
import numpy as np
from StringIO import StringIO
from scipy import linalg
import urllib2
import gzip
url = 'http://physics.muni.cz/~vazny/gauss/X.gz'
X = np.loadtxt(gzip.GzipFile(fileobj=StringIO(urllib2.urlopen(url).read())), delimiter=',')
linalg.svd(X, full_matrices=False)
which rise:
LinAlgError: SVD did not converge
on:
>>> np.__version__
'1.8.1'
>>> import scipy
>>> scipy.__version__
'0.10.1'
but did not raise an exception on:
>>> np.__version__
'1.8.2'
>>> import scipy
>>> scipy.__version__
'0.14.0'
This may be due to the singular nature of your input datamatrix (which you are feeding to PCA)
I am using numpy 1.11.0. If the matrix has more than 1 eigvalues equal to 0, then ‘SVD did not converge’ is raised.
Even if your data is correct, it may happen because it runs out of memory. In my case, moving from a 32-bit machine to a 64-bit machine with bigger memory solved the problem.
I know this post is old, but in case someone else encounters the same problem. @jseabold was right when he said that the problem is nan or inf and the op was probably right when he said that the data did not have nan’s or inf. However, if one of the columns in ori_data has always the same value, the data will get Nans, since the implementation of PCA in mlab normalizes the input data by doing
ori_data = (ori_data - mean(ori_data)) / std(ori_data).
The solution is to do:
result = PCA(ori_data, standardize=False)
In this way, only the mean will be subtracted without dividing by the standard deviation.
Following on @c-chavez answer, what worked for me was first replacing inf and -inf to nan, then removing nan.
For example:
data = data.replace(np.inf, np.nan).replace(-np.inf, np.nan).dropna()
If there are no inf or NaN values, possibly that is a memory issue. Please try in a machine with higher RAM.
This happened to me when I accidentally resized an image dataset to (0, 64, 3). Try checking the shape of your dataset to see if one of the dimensions is 0.
I had this error multiple times:
- If the length of data is 1. Then it can’t fit anything
- If a value is infinity. You divided by 0 in your processing ?
- If a value is None. This is very common.