StackingCVClassifier pre-trained base models

Question:

I haven’t been able to find any information on whether or not StackingCVClassifiers accept pre-trained models.

Asked By: BenjaminLi

||

Answers:

Probably not. StackedCVClassifiers and StackingClassifier currently take a list of base estimators, then apply fit and predict on them.

It’s pretty straightforward to implement this though. The main idea behind stacking is to fit a "final model" using the predictions of earlier models.

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=1000)
X_train, X_test, y_train, y_test = train_test_split(X, y)

Here X_train is (750, 100) and X_test is (250, 100).

We’ll emulate "pre-trained" three models fit on X_train, y_train and produce predictions using the training set and the test set:

from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.neighbors import KNeighborsRegressor

# Emulate "pre-trained" models
models = [RidgeCV(), LassoCV(), KNeighborsRegressor(n_neighbors=5)]

X_train_new = np.zeros((X_train.shape[0], len(models)))    # (750, 3)
X_test_new = np.zeros((X_test.shape[0], len(models)))      # (250, 3)

for i, model in enumerate(models):
    model.fit(X_train, y_train)
    X_train_new[:, i] = model.predict(X_train)
    X_test_new[:, i] = model.predict(X_test)

The final model is fit on X_train_new and can make predictions using (N, 3) matrices produced by our base models:

from sklearn.ensemble import GradientBoostingRegressor

clf = GradientBoostingRegressor()
clf.fit(X_train_new, y_train)
clf.score(X_test_new, y_test)
# 0.9998247
Answered By: Alexander L. Hayes