Calculate Matrix Rank using scipy
Question:
I’d like to calculate the mathematical rank of a matrix using scipy. The most obvious function numpy.rank
calculates the dimension of an array (ie. scalars have dimension 0, vectors 1, matrices 2, etc…). I am aware that the numpy.linalg.lstsq
module has this capability, but I was wondering if such a fundamental operation is built into the matrix class somewhere.
Here is an explicit example:
from numpy import matrix, rank
A = matrix([[1,3,7],[2,8,3],[7,8,1]])
print rank(A)
This gives 2
the dimension, where I’m looking for an answer of 3
.
Answers:
I don’t know about Numpy in particular, but that’s unlikely to be a built-in operation on a matrix; it involves fairly intensive numerical computations (and associated concerns about floating-point roundoff error and so forth) and threshold selections that may or may not be appropriate in a given context, and algorithm selection is important to computing it accurately and quickly.
Things that are built into the basic classes tend to be things that can be performed in a unique and straightforward manner, such as matrix multiplications at the most complex.
The linear algebra functions are generally grouped in numpy.linalg
. (They’re also available from scipy.linalg
, which has more functionality.) This allows polymorphism: the functions can accept any of the types that SciPy handles.
So, yes, the numpy.linalg.lstsq
function does what you’re asking. Why is that insufficient?
This answer is out of date.
The answer is no—there is currently no function dedicated to calculating the matrix rank of an array/matrix in scipy. Adding one has been discussed before, but if it’s going to happen, I don’t believe it has yet.
To provide a rough code snippet for people who need to get this done in practice. Feel free to improve.
u, s, v = np.linalg.svd(A)
rank = np.sum(s > 1e-10)
If numpy
does not offer a rank facility, why don’t you write your own?
An efficient way to compute the rank is via the Singular Value Decomposition – the rank of the matrix is equal to the number of non-zero singular values.
def rank(A, eps=1e-12):
u, s, vh = numpy.linalg.svd(A)
return len([x for x in s if abs(x) > eps])
Notice that eps
depends in your application – most would agree that 1e-12 corresponds to zero, but you may witness numerical instability even for eps=1e-9.
Using your example, the answer is three. If you change the second row to [2, 6, 14]
(linearly dependent with row one) the answer is two (the “zero” eigenvalue is 4.9960E-16)
Numpy provides numpy.linalg.matrix_rank()
:
>>> import numpy
>>> numpy.__version__
'1.5.1'
>>> A = numpy.matrix([[1,3,7],[2,8,3],[7,8,1]])
>>> numpy.linalg.matrix_rank(A)
3
scipy
now contains an efficient interpolative method for estimating the rank of a matrix/LinearOperator using random methods, which can often be accurate enough:
>>> from numpy import matrix
>>> A = matrix([[1,3,7],[2,8,3],[7,8,1]], dtype=float) # doesn't accept int
>>> import scipy.linalg.interpolative as sli
>>> sli.estimate_rank(A, eps=1e-10)
3
I’d like to calculate the mathematical rank of a matrix using scipy. The most obvious function numpy.rank
calculates the dimension of an array (ie. scalars have dimension 0, vectors 1, matrices 2, etc…). I am aware that the numpy.linalg.lstsq
module has this capability, but I was wondering if such a fundamental operation is built into the matrix class somewhere.
Here is an explicit example:
from numpy import matrix, rank
A = matrix([[1,3,7],[2,8,3],[7,8,1]])
print rank(A)
This gives 2
the dimension, where I’m looking for an answer of 3
.
I don’t know about Numpy in particular, but that’s unlikely to be a built-in operation on a matrix; it involves fairly intensive numerical computations (and associated concerns about floating-point roundoff error and so forth) and threshold selections that may or may not be appropriate in a given context, and algorithm selection is important to computing it accurately and quickly.
Things that are built into the basic classes tend to be things that can be performed in a unique and straightforward manner, such as matrix multiplications at the most complex.
The linear algebra functions are generally grouped in numpy.linalg
. (They’re also available from scipy.linalg
, which has more functionality.) This allows polymorphism: the functions can accept any of the types that SciPy handles.
So, yes, the numpy.linalg.lstsq
function does what you’re asking. Why is that insufficient?
This answer is out of date.
The answer is no—there is currently no function dedicated to calculating the matrix rank of an array/matrix in scipy. Adding one has been discussed before, but if it’s going to happen, I don’t believe it has yet.
To provide a rough code snippet for people who need to get this done in practice. Feel free to improve.
u, s, v = np.linalg.svd(A)
rank = np.sum(s > 1e-10)
If numpy
does not offer a rank facility, why don’t you write your own?
An efficient way to compute the rank is via the Singular Value Decomposition – the rank of the matrix is equal to the number of non-zero singular values.
def rank(A, eps=1e-12):
u, s, vh = numpy.linalg.svd(A)
return len([x for x in s if abs(x) > eps])
Notice that eps
depends in your application – most would agree that 1e-12 corresponds to zero, but you may witness numerical instability even for eps=1e-9.
Using your example, the answer is three. If you change the second row to [2, 6, 14]
(linearly dependent with row one) the answer is two (the “zero” eigenvalue is 4.9960E-16)
Numpy provides numpy.linalg.matrix_rank()
:
>>> import numpy
>>> numpy.__version__
'1.5.1'
>>> A = numpy.matrix([[1,3,7],[2,8,3],[7,8,1]])
>>> numpy.linalg.matrix_rank(A)
3
scipy
now contains an efficient interpolative method for estimating the rank of a matrix/LinearOperator using random methods, which can often be accurate enough:
>>> from numpy import matrix
>>> A = matrix([[1,3,7],[2,8,3],[7,8,1]], dtype=float) # doesn't accept int
>>> import scipy.linalg.interpolative as sli
>>> sli.estimate_rank(A, eps=1e-10)
3