Gaussian Process Regression: tune hyperparameters based on validation set

Question:

In the standard scikit-learn implementation of Gaussian-Process Regression (GPR), the hyper-parameters (of the kernel) are chosen based on the training set.

Is there an easy to use implementation of GPR (in python), where the hyperparemeters (of the kernel) are chosen based on a separate validation set? Or cross-validation would also be a nice alternative to find suitable hyperparameters (that are optimized to perform well on mutliple train-val splits). (I would prefer a solution that builds on the scikit-learn GPR.)

In detail: a set of hyperparameters theta should be found, that performs well in the following metric:
Calculate the posterior GP based on the training data (given the prior GP with hyperparameters theta). Then evaluate the negative log likelihood of the validation data with respect to the posterior.
This negative log likelihood should be minimal for theta.

In other words I want to find theta such "P[ valData | trainData, theta ]" is maximal. A non-exact approximation that might be sufficient would be to find theta such that sum_i log(P[ valData_i | trainData, theta ] is maximal, where P[ valData_i | trainData, theta ] is the Gaussian marginal posterior density of a validation data-point valData_i given the training-data set given the prior GP with hyperparameters theta.Edit: Since P[ valData | trainData, theta ] has been implemented recently (see my answer), the easier to implement approximation of P[ valData | trainData, theta ] is not needed.

Asked By: Jakob

||

Answers:

I would do it this way: first I would fit an sklearn GPR with default kernel on my validation set; then I would fit another GPR on my training set with the same hyperparemeters, but providing as kernel the kernel instance of the preavious GPR:

X_val = np.random.random((100, 5))
y_val = np.random.random((100,))

X_train = np.random.random((1000, 5))
y_train = np.random.random((1000,))

gpr_val = GaussianProcessRegressor().fit(X_val, y_val)
gpr_train = GaussianProcessRegressor(kernel=gpr_val.kernel_).fit(X_train, y_train)

Two days ago a paper has been presented at ICML that implements my suggestion of splitting the training data into a hyperparameter training set D<m and a hyperparameter validation set D>=m and selecting hyperparameter theta which optimize max p(D>=m|D<m, theta):
https://proceedings.mlr.press/v162/lotfi22a.html.
This paper won an ICML outstanding paper award. They discuss the advantages compared to standars maximization of marginal liklihood and provide some code: https://github.com/Sanaelotfi/Bayesian_model_comparison

I hope that somone implements this (often superior) option for hyperparameter tuning into standard GPR implementation such as the one in scikit-learn.

Answered By: Jakob