Is sklearn.model_selection.GridSearchCV can do custom threshold?
Question:
My goal is to do threshold tuning before parameter tuning. The idea is simple, in imbalanced dataset, if class 1 is minority, then the threshold should be lower than 0.5, so it predict more instance as class 1 instead of 0.
Therefore, I believe, by changing the threshold early, we can improve the model predictive power even more than (parameter tuning – threshold tuning).
The problem is, I don’t find the parameter in GridSearchCV to change the threshold.
Answers:
You can’t directly change the threshold used by predict
(which gets called by your scorer, presumably), but you can provide a customer scoring
method. See the User Guide. Here I think you’d want something like:
def f2_score_at_thresh(y_true, y_pos_prob, threshold):
y_pred = y_pos_prob > threshold
return fbeta_score(y_true, y_pred, beta=2, ...)
my_scorer = make_scorer(f2_scorer, needs_proba=True, threshold=0.2)
GridSearchCV(..., scoring=my_scorer)
My goal is to do threshold tuning before parameter tuning. The idea is simple, in imbalanced dataset, if class 1 is minority, then the threshold should be lower than 0.5, so it predict more instance as class 1 instead of 0.
Therefore, I believe, by changing the threshold early, we can improve the model predictive power even more than (parameter tuning – threshold tuning).
The problem is, I don’t find the parameter in GridSearchCV to change the threshold.
You can’t directly change the threshold used by predict
(which gets called by your scorer, presumably), but you can provide a customer scoring
method. See the User Guide. Here I think you’d want something like:
def f2_score_at_thresh(y_true, y_pos_prob, threshold):
y_pred = y_pos_prob > threshold
return fbeta_score(y_true, y_pred, beta=2, ...)
my_scorer = make_scorer(f2_scorer, needs_proba=True, threshold=0.2)
GridSearchCV(..., scoring=my_scorer)