sklearn: early_stopping with eval_set?
Question:
I was using xgboost
and it provides the early_stopping
feature that is quite good.
However, when I look to sklearn
fit
function, I see only Xtrain, ytrain
parameters but no parameters for early_stopping.
Is there a way to pass the evaluation set to sklearn for early_stopping?
Thanks
Answers:
In sklearn.ensemble.GradientBoosting
, Early stopping must be configured when you instantiate a model, not when you do fit
.
validation_fraction : float, optional, default 0.1 The proportion of
training data to set aside as validation set for early stopping. Must
be between 0 and 1. Only used if n_iter_no_change is set to an
integer.
n_iter_no_change : int, default None n_iter_no_change is used to
decide if early stopping will be used to terminate training when
validation score is not improving. By default it is set to None to
disable early stopping. If set to a number, it will set aside
validation_fraction size of the training data as validation and
terminate training when validation score is not improving in all of
the previous n_iter_no_change numbers of iterations.
tol : float, optional, default 1e-4 Tolerance for the early stopping.
When the loss is not improving by at least tol for n_iter_no_change
iterations (if set to a number), the training stops.
In order to set early_Stopping
, you should consider passing above arguments to your model.
You may want to read Early stopping of Gradient Boosting for full explanation and examples.
The parameter name is early_stopping_rounds
when you call .fit()
for xgboost.sklearn.XGBClassifier()
.
Working example!
from sklearn.datasets import load_breast_cancer
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
from xgboost.sklearn import XGBClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.2, random_state = 100)
GBM = XGBClassifier()
GBM.fit(X_train, y_train, eval_metric="auc",
eval_set=[(X_test,y_test)], early_stopping_rounds=2)
If you intent to use the sklearn.ensemble.GradientBoostingClassifier()
, then you have to set the tol
as 0 and n_iter_no_change
as the value equal to early_stopping_rounds
.
Note: sklearn.ensemble.GradientBoostingClassifier()
does not take the seperate validation dataset, you have to feed the complete dataset and then mention the fraction of validation fraction using validation_fraction
.
If you are using GradientBoostingClassifier
The other answers provided here won’t help if you want k-fold cross-validation and also won’t help if you want the early stopping on metrics of your choice. Why?
- Because there is no way to pass the evaluation set from outside
- There is no provision to provide a custom metric measure
Use the following code to achieve early_stopping
the way you are used to doing it:
#create a gradient booster
gbc = GradientBoostingClassifier()
#define the metric function that you want to use to early stopping
def accuracy(y_true, y_preds):
return #return the metric output here
#This class along with the monitor argument will enable early stopping
class early_stopping_gbc():
def __init__(self, accuracy, eval_set, early_stopping_rounds = 20):
self.accuracy = accuracy
self.x_val = eval_set[0]
self.y_val = eval_set[1]
self.best_perf = 0.
self.counter = 0
self.early_stopping_rounds = early_stopping_rounds
def __call__(self,i, model, local_vars):
for counter, preds in enumerate(model.staged_predict_proba(self.x_val)):
if counter == i:
break
acc = self.accuracy(self.y_val,preds[:,1])
if acc > self.best_perf:
self.best_perf = acc
self.counter = 0
else:
self.counter += 1
return self.counter > self.early_stopping_rounds
#Run gradient booster with early stopping on 20 rounds
gbc.fit(X_train,y_train, monitor = early_stopping_gbc(accuracy, [X_val,y_val], early_stopping_rounds = 20))
This enables you to do k-fold cross validation and also use the metric of your choice
I was using xgboost
and it provides the early_stopping
feature that is quite good.
However, when I look to sklearn
fit
function, I see only Xtrain, ytrain
parameters but no parameters for early_stopping.
Is there a way to pass the evaluation set to sklearn for early_stopping?
Thanks
In sklearn.ensemble.GradientBoosting
, Early stopping must be configured when you instantiate a model, not when you do fit
.
validation_fraction : float, optional, default 0.1 The proportion of
training data to set aside as validation set for early stopping. Must
be between 0 and 1. Only used if n_iter_no_change is set to an
integer.n_iter_no_change : int, default None n_iter_no_change is used to
decide if early stopping will be used to terminate training when
validation score is not improving. By default it is set to None to
disable early stopping. If set to a number, it will set aside
validation_fraction size of the training data as validation and
terminate training when validation score is not improving in all of
the previous n_iter_no_change numbers of iterations.tol : float, optional, default 1e-4 Tolerance for the early stopping.
When the loss is not improving by at least tol for n_iter_no_change
iterations (if set to a number), the training stops.
In order to set early_Stopping
, you should consider passing above arguments to your model.
You may want to read Early stopping of Gradient Boosting for full explanation and examples.
The parameter name is early_stopping_rounds
when you call .fit()
for xgboost.sklearn.XGBClassifier()
.
Working example!
from sklearn.datasets import load_breast_cancer
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
from xgboost.sklearn import XGBClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.2, random_state = 100)
GBM = XGBClassifier()
GBM.fit(X_train, y_train, eval_metric="auc",
eval_set=[(X_test,y_test)], early_stopping_rounds=2)
If you intent to use the sklearn.ensemble.GradientBoostingClassifier()
, then you have to set the tol
as 0 and n_iter_no_change
as the value equal to early_stopping_rounds
.
Note: sklearn.ensemble.GradientBoostingClassifier()
does not take the seperate validation dataset, you have to feed the complete dataset and then mention the fraction of validation fraction using validation_fraction
.
If you are using GradientBoostingClassifier
The other answers provided here won’t help if you want k-fold cross-validation and also won’t help if you want the early stopping on metrics of your choice. Why?
- Because there is no way to pass the evaluation set from outside
- There is no provision to provide a custom metric measure
Use the following code to achieve early_stopping
the way you are used to doing it:
#create a gradient booster
gbc = GradientBoostingClassifier()
#define the metric function that you want to use to early stopping
def accuracy(y_true, y_preds):
return #return the metric output here
#This class along with the monitor argument will enable early stopping
class early_stopping_gbc():
def __init__(self, accuracy, eval_set, early_stopping_rounds = 20):
self.accuracy = accuracy
self.x_val = eval_set[0]
self.y_val = eval_set[1]
self.best_perf = 0.
self.counter = 0
self.early_stopping_rounds = early_stopping_rounds
def __call__(self,i, model, local_vars):
for counter, preds in enumerate(model.staged_predict_proba(self.x_val)):
if counter == i:
break
acc = self.accuracy(self.y_val,preds[:,1])
if acc > self.best_perf:
self.best_perf = acc
self.counter = 0
else:
self.counter += 1
return self.counter > self.early_stopping_rounds
#Run gradient booster with early stopping on 20 rounds
gbc.fit(X_train,y_train, monitor = early_stopping_gbc(accuracy, [X_val,y_val], early_stopping_rounds = 20))
This enables you to do k-fold cross validation and also use the metric of your choice