How to find the optimum number of estimators using "OOB" method in sklearn boosting?

Question

The gbm package in R has a function gbm.perf to find the optimum number of trees for the model using different methods like "Out-of-Bag" or "Cross-Validation" error, which helps to avoid over-fitting.

Does Gradientboosting inScikit learn library in python also have a similar function to find the optimum number of trees using the "out of bag" method ?

#r code

mod1 = gbm(var~.,data=dat, interaction.depth = 3)
best.iter = gbm.perf(mod1,method="OOB")
scores = mean(predict(mod1,x,best.iter))

#python code

modl = GradientBoostingRegressor(max_depth= 3)
modl.fit(x,y)
scores = np.mean(modl.predict(dat))

Asked By: lakshman thota

||

Source

Answer 1

Yes,gbm in scikit learn also have a method to find the best iterations using the oob just like in R. can refer to the below link

"in order to use oob_improvement_ in gdm the subsample should be less than 0.5"

# Fit regressor with out-of-bag estimates
params = {
"n_estimators": 1200,
"max_depth": 3,
"subsample": 0.5
}
modl = ensemble.GradientBoostingRegressor(**params)
n_estimators = params["n_estimators"]
z=np.arange(n_estimators)+1
# negative cumulative sum of oob improvements
cumsum = -np.cumsum(modl.oob_improvement_)
# min loss according to OOB
oob_best_iter = z[np.argmin(cumsum)]
print(oob_best_iter)
modl= GradientBoostingRegressor(max_depth=3
,subsample=0.5,n_estimators=oob_best_iter)
modl.fit(x,y)

Answered By: lakshman thota

How to find the optimum number of estimators using "OOB" method in sklearn boosting?

Question:

Answers: