Gaussian Process Regression: standard deviation meaning

Question:

In the following code about the Gaussian Process Regression (GPR):

from sklearn.datasets import make_friedman2
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel

X, y = make_friedman2(n_samples=500, noise=0, random_state=0)

kernel = DotProduct() + WhiteKernel()
gpr = GaussianProcessRegressor(kernel=kernel, random_state=0).fit(X, y)

print gpr.score(X, y)
print gpr.predict(X[:2,:], return_std=True) 

What is the meaning of the “standard deviation” obtained from: gpr.predict(X[:2,:], return_std=True)?

For instance, if I compare the GPR with the Support Vector Regression (SVR), it doesn’t have it in the predict method. When I use the SVR algorithm I usually get the standard error from cross-validation.

I use it in Bayesian Optimization, that’s why I need to know the source of the standard error.

Asked By: maicon

||

Answers:

Gaussian Processes are Bayesian and therefore a fitted regression results in a distribution over the possible parameters. This then allows one to compute a predictive distribution rather than simply point estimates. Setting the argument return_std=True, results in the method returning the standard deviations associated with each query point. These provide theoretical bounds on the predictions, informed by the strength of the evidence (training data) provided.

Cross-validation is typically used when one cannot perform such analytical calculations exactly. The key advantage to Gaussian Process methods are their tractability.

Answered By: merv

How GPR calculate the standard deviation for each label? What is the math working behind the scene?

Answered By: SOUVIK MANNA