What is the difference between numpy.linalg.lstsq and scipy.linalg.lstsq?
Question:
lstsq
tries to solve Ax=b
minimizing |b - Ax|
. Both scipy and numpy provide a linalg.lstsq
function with a very similar interface. The documentation does not mention which kind of algorithm is used, neither for scipy.linalg.lstsq nor for numpy.linalg.lstsq, but it seems to do pretty much the same.
The implementation seems to be different for scipy.linalg.lstsq and numpy.linalg.lstsq. Both seem to use LAPACK, both algorithms seem to use a SVD.
Where is the difference? Which one should I use?
Note: do not confuse linalg.lstsq
with scipy.optimize.leastsq
which can solve also non-linear optimization problems.
Answers:
If I read the source code right (Numpy 1.8.2, Scipy 0.14.1
), numpy.linalg.lstsq()
uses the LAPACK routine xGELSD
and scipy.linalg.lstsq()
usesxGELSS
.
The LAPACK Manual Sec. 2.4 states
The subroutine xGELSD is significantly faster than its older counterpart xGELSS, especially for large problems, but may require somewhat more workspace depending on the matrix dimensions.
That means that Numpy is faster but uses more memory.
Update August 2017:
Scipy now uses xGELSD by default https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html
Numpy 1.13 – June 2017
As of Numpy 1.13 and Scipy 0.19, both scipy.linalg.lstsq() and numpy.linalg.lstsq() call by default the same LAPACK code DSGELD (see LAPACK documentation).
However, a current important difference between the two function is in the adopted default RCOND LAPACK parameter (called rcond
by Numpy and cond
by Scipy), which defines the threshold for singular values.
Scipy uses a good and robust default threshold RCOND=eps*max(A.shape)*S[0]
, where S[0]
is the largest singular value of A
, while Numpy uses a default threshold RCOND=-1
, which corresponds to setting in LAPACK the threshold equal to the machine precision, regardless of the values of A
.
Numpy’s default approach is basically useless in realistic applications and will generally result in a very degenerate solution when A
is nearly rank deficient, wasting the accuracy of the singular value decomposition SVD used by DSGELD. This implies that in Numpy the optional parameter rcond
should be always used.
Update: Numpy 1.14 – January 2018
I reported the incorrect default of rcond
(see above Section) in numpy.linalg.lstsq() and the function now raises a FutureWarning
in Numpy 1.14 (see Future Changes).
The future behaviour will be identical both in scipy.linalg.lstsq() and in numpy.linalg.lstsq(). In other words, Scipy and Numpy will not only use the same LAPACK code, but also use the same defaults.
To start using the proper (i.e. future) default in Numpy 1.14, one should call numpy.linalg.lstsq() with an explicit rcond=None
.
numpy.linalg.lstsq
and scipy.linalg.lstsq
are two different functions that can be used to solve linear least squares problems. The main difference between the two functions is that scipy.linalg.lstsq
is more feature-complete and robust than numpy.linalg.lstsq
.
Here are some key differences between the two functions:
scipy.linalg.lstsq
can handle rank-deficient matrices (i.e., matrices with linearly dependent rows) more robustly than numpy.linalg.lstsq
, which may produce unexpected results when given rank-deficient matrices.
scipy.linalg.lstsq
has additional optional parameters that allow you to specify the behavior of the solver in more detail. For example, you can specify whether you want to compute the full or truncated singular value decomposition (SVD) of the matrix, or whether you want to use a more efficient but less stable algorithm.
scipy.linalg.lstsq
can handle more complex linear least squares problems, such as those with bounds on the variables or those that involve linear equality and inequality constraints. numpy.linalg.lstsq
is limited to solving basic linear least squares problems of the form Ax = b.
Overall, scipy.linalg.lstsq is the more feature-complete and robust function for solving linear least squares problems, and it is the recommended function to use if you need to solve such problems in your code. numpy.linalg.lstsq is a simpler function that may be sufficient for basic linear least squares problems, but it may not be suitable for more complex problems.
lstsq
tries to solve Ax=b
minimizing |b - Ax|
. Both scipy and numpy provide a linalg.lstsq
function with a very similar interface. The documentation does not mention which kind of algorithm is used, neither for scipy.linalg.lstsq nor for numpy.linalg.lstsq, but it seems to do pretty much the same.
The implementation seems to be different for scipy.linalg.lstsq and numpy.linalg.lstsq. Both seem to use LAPACK, both algorithms seem to use a SVD.
Where is the difference? Which one should I use?
Note: do not confuse linalg.lstsq
with scipy.optimize.leastsq
which can solve also non-linear optimization problems.
If I read the source code right (Numpy 1.8.2, Scipy 0.14.1
), numpy.linalg.lstsq()
uses the LAPACK routine xGELSD
and scipy.linalg.lstsq()
usesxGELSS
.
The LAPACK Manual Sec. 2.4 states
The subroutine xGELSD is significantly faster than its older counterpart xGELSS, especially for large problems, but may require somewhat more workspace depending on the matrix dimensions.
That means that Numpy is faster but uses more memory.
Update August 2017:
Scipy now uses xGELSD by default https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html
Numpy 1.13 – June 2017
As of Numpy 1.13 and Scipy 0.19, both scipy.linalg.lstsq() and numpy.linalg.lstsq() call by default the same LAPACK code DSGELD (see LAPACK documentation).
However, a current important difference between the two function is in the adopted default RCOND LAPACK parameter (called rcond
by Numpy and cond
by Scipy), which defines the threshold for singular values.
Scipy uses a good and robust default threshold RCOND=eps*max(A.shape)*S[0]
, where S[0]
is the largest singular value of A
, while Numpy uses a default threshold RCOND=-1
, which corresponds to setting in LAPACK the threshold equal to the machine precision, regardless of the values of A
.
Numpy’s default approach is basically useless in realistic applications and will generally result in a very degenerate solution when A
is nearly rank deficient, wasting the accuracy of the singular value decomposition SVD used by DSGELD. This implies that in Numpy the optional parameter rcond
should be always used.
Update: Numpy 1.14 – January 2018
I reported the incorrect default of rcond
(see above Section) in numpy.linalg.lstsq() and the function now raises a FutureWarning
in Numpy 1.14 (see Future Changes).
The future behaviour will be identical both in scipy.linalg.lstsq() and in numpy.linalg.lstsq(). In other words, Scipy and Numpy will not only use the same LAPACK code, but also use the same defaults.
To start using the proper (i.e. future) default in Numpy 1.14, one should call numpy.linalg.lstsq() with an explicit rcond=None
.
numpy.linalg.lstsq
and scipy.linalg.lstsq
are two different functions that can be used to solve linear least squares problems. The main difference between the two functions is that scipy.linalg.lstsq
is more feature-complete and robust than numpy.linalg.lstsq
.
Here are some key differences between the two functions:
scipy.linalg.lstsq
can handle rank-deficient matrices (i.e., matrices with linearly dependent rows) more robustly thannumpy.linalg.lstsq
, which may produce unexpected results when given rank-deficient matrices.scipy.linalg.lstsq
has additional optional parameters that allow you to specify the behavior of the solver in more detail. For example, you can specify whether you want to compute the full or truncated singular value decomposition (SVD) of the matrix, or whether you want to use a more efficient but less stable algorithm.scipy.linalg.lstsq
can handle more complex linear least squares problems, such as those with bounds on the variables or those that involve linear equality and inequality constraints.numpy.linalg.lstsq
is limited to solving basic linear least squares problems of the form Ax = b.
Overall, scipy.linalg.lstsq is the more feature-complete and robust function for solving linear least squares problems, and it is the recommended function to use if you need to solve such problems in your code. numpy.linalg.lstsq is a simpler function that may be sufficient for basic linear least squares problems, but it may not be suitable for more complex problems.