scikit-learn: How do I define the thresholds for the ROC curve?
Question:
When plotting the ROC (or deriving the AUC) in scikit-learn
, how can one specify arbitrary thresholds for roc_curve
, rather than having the function calculate them internally and return them?
from sklearn.metrics import roc_curve
fpr,tpr,thresholds = roc_curve(y_true,y_pred)
A related question was asked at Scikit – How to define thresholds for plotting roc curve, but the OP’s accepted answer indicates that their intent was different to how it was written.
Thanks!
Answers:
What you get from the classifier are scores, not just a class prediction.
roc_curve
will give you a set of thresholds with associated false positive rates and true positive rates.
If you want your own threshold, just use it:
y_class = y_pred > threshold
Then you can display a confusion matrix, with this new y_class
compared to y_true
.
And if you want several thresholds, do the same, and get the confusion matrix from each of them to get the true and false positive rate.
It’s quite simple. ROC curve shows you outputs for different thresholds. You always choose best threshold for you model to get forecasts, but ROC curve shows you how robust/good your model is for different thresholds. Here you have quite good explanation how it works: https://www.dataschool.io/roc-curves-and-auc-explained/
When plotting the ROC (or deriving the AUC) in scikit-learn
, how can one specify arbitrary thresholds for roc_curve
, rather than having the function calculate them internally and return them?
from sklearn.metrics import roc_curve
fpr,tpr,thresholds = roc_curve(y_true,y_pred)
A related question was asked at Scikit – How to define thresholds for plotting roc curve, but the OP’s accepted answer indicates that their intent was different to how it was written.
Thanks!
What you get from the classifier are scores, not just a class prediction.
roc_curve
will give you a set of thresholds with associated false positive rates and true positive rates.
If you want your own threshold, just use it:
y_class = y_pred > threshold
Then you can display a confusion matrix, with this new y_class
compared to y_true
.
And if you want several thresholds, do the same, and get the confusion matrix from each of them to get the true and false positive rate.
It’s quite simple. ROC curve shows you outputs for different thresholds. You always choose best threshold for you model to get forecasts, but ROC curve shows you how robust/good your model is for different thresholds. Here you have quite good explanation how it works: https://www.dataschool.io/roc-curves-and-auc-explained/