XGBoost XGBClassifier Defaults in Python
Question:
I am attempting to use XGBoosts classifier to classify some binary data. When I do the simplest thing and just use the defaults (as follows)
clf = xgb.XGBClassifier()
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)
I get reasonably good classification results.
My next step was to try tuning my parameters. Guessing from the parameters guide at…
https://github.com/dmlc/xgboost/blob/master/doc/parameter.md
I wanted to start from the default and work from there…
# setup parameters for xgboost
param = {}
param['booster'] = 'gbtree'
param['objective'] = 'binary:logistic'
param["eval_metric"] = "error"
param['eta'] = 0.3
param['gamma'] = 0
param['max_depth'] = 6
param['min_child_weight']=1
param['max_delta_step'] = 0
param['subsample']= 1
param['colsample_bytree']=1
param['silent'] = 1
param['seed'] = 0
param['base_score'] = 0.5
clf = xgb.XGBClassifier(params)
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)
The result is everything being predicted to be one of the conditions and not the other.
curiously if I set
params={}
which I expected to give me the same defaults as not feeding any parameters, I get the same thing happening
So does anyone know what the defaults for XGBclassifier is? so that I can start tuning?
Answers:
That isn’t how you set parameters in xgboost. You would either want to pass your param grid into your training function, such as xgboost’s train
or sklearn’s GridSearchCV
, or you would want to use your XGBClassifier’s set_params
method. Another thing to note is that if you’re using xgboost’s wrapper to sklearn (ie: the XGBClassifier()
or XGBRegressor()
classes) then the paramater names used are the same ones used in sklearn’s own GBM class (ex: eta –> learning_rate). I’m not seeing where the exact documentation for the sklearn wrapper is hidden, but the code for those classes is here: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py
For your reference here is how you would set the model object parameters directly.
>>> grid = {'max_depth':10}
>>>
>>> clf = XGBClassifier()
>>> clf.max_depth
3
>>> clf.set_params(**grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
>>> clf.max_depth
10
EDIT:
I suppose you can set parameters on model creation, it just isn’t super typical to do so since most people grid search in some means. However if you do so you would need to either list them as full params or use **kwargs. For example:
>>> XGBClassifier(max_depth=10)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
>>> XGBClassifier(**grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
Using a dictionary as input without **kwargs will set that parameter to literally be your dictionary:
>>> XGBClassifier(grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0,
max_depth={'max_depth': 10}, min_child_weight=1, missing=None,
n_estimators=100, nthread=-1, objective='binary:logistic',
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True,
subsample=1)
For starters, looks like you’re missing an s for your variable param
.
You wrote param at the top:
param = {}
param['booster'] = 'gbtree'
param['objective'] = 'binary:logistic'
.
.
.
…but use params farther down, when training the model:
clf = xgb.XGBClassifier(params) <-- different variable!
Was that just a typo in your example?
The defaults for XGBClassifier are:
- max_depth=3
- learning_rate=0.1
- n_estimators=100
- silent=True
- objective=’binary:logistic’
- booster=’gbtree’
- n_jobs=1
- nthread=None
- gamma=0
- min_child_weight=1
- max_delta_step=0
- subsample=1
- colsample_bytree=1
- colsample_bylevel=1
- reg_alpha=0
- reg_lambda=1
- scale_pos_weight=1
- base_score=0.5
- random_state=0
- seed=None
- missing=None
Link to XGBClassifier documentation with class defaults: https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier
You’re almost there! You just forgot to unpack the params dictionary (the ** operator). Instead of this (which passes a single dictionary as the first positional arg):
clf = xgb.XGBClassifier(params)
You should have done this (which makes it so that the keys in the dictionary are each passed as keyword args):
clf = xgb.XGBClassifier(**params)
(Updated) Default values are visible once you fit the out-of-box classifier model:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
importance_type='gain', interaction_constraints='',
learning_rate=0.300000012, max_delta_step=0, max_depth=6,
min_child_weight=1, missing=nan, monotone_constraints='()',
n_estimators=100, n_jobs=12, num_parallel_tree=1,
objective='multi:softprob', random_state=0, reg_alpha=0,
reg_lambda=1, scale_pos_weight=None, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=None)
Details are available here: https://xgboost.readthedocs.io/en/latest/parameter.html
I am attempting to use XGBoosts classifier to classify some binary data. When I do the simplest thing and just use the defaults (as follows)
clf = xgb.XGBClassifier()
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)
I get reasonably good classification results.
My next step was to try tuning my parameters. Guessing from the parameters guide at…
https://github.com/dmlc/xgboost/blob/master/doc/parameter.md
I wanted to start from the default and work from there…
# setup parameters for xgboost
param = {}
param['booster'] = 'gbtree'
param['objective'] = 'binary:logistic'
param["eval_metric"] = "error"
param['eta'] = 0.3
param['gamma'] = 0
param['max_depth'] = 6
param['min_child_weight']=1
param['max_delta_step'] = 0
param['subsample']= 1
param['colsample_bytree']=1
param['silent'] = 1
param['seed'] = 0
param['base_score'] = 0.5
clf = xgb.XGBClassifier(params)
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)
The result is everything being predicted to be one of the conditions and not the other.
curiously if I set
params={}
which I expected to give me the same defaults as not feeding any parameters, I get the same thing happening
So does anyone know what the defaults for XGBclassifier is? so that I can start tuning?
That isn’t how you set parameters in xgboost. You would either want to pass your param grid into your training function, such as xgboost’s train
or sklearn’s GridSearchCV
, or you would want to use your XGBClassifier’s set_params
method. Another thing to note is that if you’re using xgboost’s wrapper to sklearn (ie: the XGBClassifier()
or XGBRegressor()
classes) then the paramater names used are the same ones used in sklearn’s own GBM class (ex: eta –> learning_rate). I’m not seeing where the exact documentation for the sklearn wrapper is hidden, but the code for those classes is here: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py
For your reference here is how you would set the model object parameters directly.
>>> grid = {'max_depth':10}
>>>
>>> clf = XGBClassifier()
>>> clf.max_depth
3
>>> clf.set_params(**grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
>>> clf.max_depth
10
EDIT:
I suppose you can set parameters on model creation, it just isn’t super typical to do so since most people grid search in some means. However if you do so you would need to either list them as full params or use **kwargs. For example:
>>> XGBClassifier(max_depth=10)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
>>> XGBClassifier(**grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
Using a dictionary as input without **kwargs will set that parameter to literally be your dictionary:
>>> XGBClassifier(grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0,
max_depth={'max_depth': 10}, min_child_weight=1, missing=None,
n_estimators=100, nthread=-1, objective='binary:logistic',
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True,
subsample=1)
For starters, looks like you’re missing an s for your variable param
.
You wrote param at the top:
param = {}
param['booster'] = 'gbtree'
param['objective'] = 'binary:logistic'
.
.
.
…but use params farther down, when training the model:
clf = xgb.XGBClassifier(params) <-- different variable!
Was that just a typo in your example?
The defaults for XGBClassifier are:
- max_depth=3
- learning_rate=0.1
- n_estimators=100
- silent=True
- objective=’binary:logistic’
- booster=’gbtree’
- n_jobs=1
- nthread=None
- gamma=0
- min_child_weight=1
- max_delta_step=0
- subsample=1
- colsample_bytree=1
- colsample_bylevel=1
- reg_alpha=0
- reg_lambda=1
- scale_pos_weight=1
- base_score=0.5
- random_state=0
- seed=None
- missing=None
Link to XGBClassifier documentation with class defaults: https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier
You’re almost there! You just forgot to unpack the params dictionary (the ** operator). Instead of this (which passes a single dictionary as the first positional arg):
clf = xgb.XGBClassifier(params)
You should have done this (which makes it so that the keys in the dictionary are each passed as keyword args):
clf = xgb.XGBClassifier(**params)
(Updated) Default values are visible once you fit the out-of-box classifier model:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
importance_type='gain', interaction_constraints='',
learning_rate=0.300000012, max_delta_step=0, max_depth=6,
min_child_weight=1, missing=nan, monotone_constraints='()',
n_estimators=100, n_jobs=12, num_parallel_tree=1,
objective='multi:softprob', random_state=0, reg_alpha=0,
reg_lambda=1, scale_pos_weight=None, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=None)
Details are available here: https://xgboost.readthedocs.io/en/latest/parameter.html