Specificity in scikit learn
Question:
I need specificity
for my classification which is defined as :
TN/(TN+FP)
I am writing a custom scorer function :
from sklearn.metrics import make_scorer
def specificity_loss_func(ground_truth, predictions):
print predictions
tp, tn, fn, fp = 0.0,0.0,0.0,0.0
for l,m in enumerate(ground_truth):
if m==predictions[l] and m==1:
tp+=1
if m==predictions[l] and m==0:
tn+=1
if m!=predictions[l] and m==1:
fn+=1
if m!=predictions[l] and m==0:
fp+=1
`return tn/(tn+fp)
score = make_scorer(specificity_loss_func, greater_is_better=True)
Then,
from sklearn.dummy import DummyClassifier
clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0)
ground_truth = [0,0,1,0,1,1,1,0,0,1,0,0,1]
p = [0,0,0,1,0,1,1,1,1,0,0,1,0]
clf_dummy = clf_dummy.fit(ground_truth, p)
score(clf_dummy, ground_truth, p)
When I run these commands, I get p
printed as :
[0 0 0 0 0 0 0 0 0 0 0 0 0]
1.0
Why is my p
changing to a series of zeros when I input p = [0,0,0,1,0,1,1,1,1,0,0,1,0]
Answers:
First of all you need to know that:
DummyClassifier(strategy='most_frequent'...
Will give you classifier which returns most frequent label from your training set. It doesn’t even take into consideration samples in X. You can pass anything instead of ground_truth in this line:
clf_dummy = clf_dummy.fit(ground_truth, p)
result of training, and predictions will stay same, because majority of labels inside p is label “0”.
Second thing that you need to know:
make_scorer returns function with interface scorer(estimator, X, y)
This function will call predict method of estimator on set X, and calculates your specificity function between predicted labels and y.
So it calls clf_dummy on any dataset (doesn’t matter which one, it will always return 0), and returns vector of 0’s, then it computes specificity loss between ground_truth and predictions. Your predictions is 0 because 0 was majority class in training set. Your score is equals 1 because there is no false positive predictions.
I corrected your code, to add more convenience.
from sklearn.dummy import DummyClassifier
clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0)
X = [[0],[0],[1],[0],[1],[1],[1],[0],[0],[1],[0],[0],[1]]
p = [0,0,0,1,0,1,1,1,1,0,0,1,0]
clf_dummy = clf_dummy.fit(X, p)
score(clf_dummy, X, p)
You could get specificity
from the confusion matrix
. For a binary classification problem, it would be something like:
from sklearn.metrics import confusion_matrix
y_true = [0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 1, 0, 1, 0, 1, 0, 1]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
specificity = tn / (tn+fp)
As I understand it, ‘specificity’ is just a special case of ‘recall’. Recall is calculated for the actual positive class ( TP / [TP+FN] ), whereas ‘specificity’ is the same type of calculation but for the actual negative class ( TN / [TN+FP] ).
It really only makes sense to have such specific terminology for binary classification problems. For a multi-class classification problem it would be more convenient to talk about recall with respect to each class. There is no reason why you can’t talk about recall in this way even when dealing with binary classification problem (e.g. recall for class 0, recall for class 1).
For example, recall tells us the proportion of patients that actual have cancer, being successfully diagnosed as having cancer. However, to generalize, you could say Class X recall tells us the proportion of samples actually belonging to Class X, being successfully predicted as belonging to Class X.
Given this, you can use from sklearn.metrics import classification_report
to produce a dictionary of the precision, recall, f1-score and support for each label/class. You can also rely on from sklearn.metrics import precision_recall_fscore_support
as well, depending on your preference. Documentation here.
from sklearn.metrics import precision_recall_fscore_support
labels = ['dog', 'cat', 'pig']
y_true = np.array(['cat', 'dog', 'pig', 'cat', 'dog', 'pig'])
y_pred = np.array(['cat', 'pig', 'dog', 'cat', 'cat', 'dog'])
prfs = precision_recall_fscore_support(y_true, y_pred, average=None, labels=labels)
precisions = prfs[0]
recalls = prfs[1] #Specificity in Binary Classification
fbeta_scores = prfs[2]
supports = prfs[3]
print(recalls) # Note the order of this array is dependent on the order of your labels array
Remembering that in binary classification, recall of the positive class is also known as “sensitivity”; recall of the negative class is “specificity”, I use this:
unique, counts = np.unique(y_test, return_counts=True)
for i in unique:
score = precision_score(y_true, y_pred, labels=unique, pos_label=i)
print('score ' + str(i) + ' ' + str(score))
As it was mentioned in the other answers, specificity is the recall of the negative class. You can reach it just setting the pos_label
parameter:
from sklearn.metrics import recall_score
y_true = [0, 1, 0, 0, 1, 0]
y_pred = [0, 0, 1, 1, 1, 1]
recall_score(y_true, y_pred, pos_label=0)
which returns .25
.
I personally rely on using classification_report
a lot from sklearn and so wanted to extend it with specificity values, so came up with the following code.
Note that I only add it to the macro avg
, though it should be easy to extend it to the weighted average output as well
import random
import numpy as np
from sklearn.metrics import classification_report
def extended_classification_report(y_true: np.array, y_pred: np.array, classes: set = None):
report = classification_report(y_true, y_pred, output_dict=True, zero_division=0)
report['macro avg']['specificity'] = specificity(y_true, y_pred, classes=classes)
return report
def specificity(y_true: np.array, y_pred: np.array, classes: set = None):
if classes is None: # Determine classes from the values
classes = set(np.concatenate((np.unique(y_true), np.unique(y_pred))))
specs = []
for cls in classes:
y_true_cls = (y_true == cls).astype(int)
y_pred_cls = (y_pred == cls).astype(int)
fp = sum(y_pred_cls[y_true_cls != 1])
tn = sum(y_pred_cls[y_true_cls == 0] == False)
specificity_val = tn / (tn + fp)
specs.append(specificity_val)
return np.mean(specs)
I need specificity
for my classification which is defined as :
TN/(TN+FP)
I am writing a custom scorer function :
from sklearn.metrics import make_scorer
def specificity_loss_func(ground_truth, predictions):
print predictions
tp, tn, fn, fp = 0.0,0.0,0.0,0.0
for l,m in enumerate(ground_truth):
if m==predictions[l] and m==1:
tp+=1
if m==predictions[l] and m==0:
tn+=1
if m!=predictions[l] and m==1:
fn+=1
if m!=predictions[l] and m==0:
fp+=1
`return tn/(tn+fp)
score = make_scorer(specificity_loss_func, greater_is_better=True)
Then,
from sklearn.dummy import DummyClassifier
clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0)
ground_truth = [0,0,1,0,1,1,1,0,0,1,0,0,1]
p = [0,0,0,1,0,1,1,1,1,0,0,1,0]
clf_dummy = clf_dummy.fit(ground_truth, p)
score(clf_dummy, ground_truth, p)
When I run these commands, I get p
printed as :
[0 0 0 0 0 0 0 0 0 0 0 0 0]
1.0
Why is my p
changing to a series of zeros when I input p = [0,0,0,1,0,1,1,1,1,0,0,1,0]
First of all you need to know that:
DummyClassifier(strategy='most_frequent'...
Will give you classifier which returns most frequent label from your training set. It doesn’t even take into consideration samples in X. You can pass anything instead of ground_truth in this line:
clf_dummy = clf_dummy.fit(ground_truth, p)
result of training, and predictions will stay same, because majority of labels inside p is label “0”.
Second thing that you need to know:
make_scorer returns function with interface scorer(estimator, X, y)
This function will call predict method of estimator on set X, and calculates your specificity function between predicted labels and y.
So it calls clf_dummy on any dataset (doesn’t matter which one, it will always return 0), and returns vector of 0’s, then it computes specificity loss between ground_truth and predictions. Your predictions is 0 because 0 was majority class in training set. Your score is equals 1 because there is no false positive predictions.
I corrected your code, to add more convenience.
from sklearn.dummy import DummyClassifier
clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0)
X = [[0],[0],[1],[0],[1],[1],[1],[0],[0],[1],[0],[0],[1]]
p = [0,0,0,1,0,1,1,1,1,0,0,1,0]
clf_dummy = clf_dummy.fit(X, p)
score(clf_dummy, X, p)
You could get specificity
from the confusion matrix
. For a binary classification problem, it would be something like:
from sklearn.metrics import confusion_matrix
y_true = [0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 1, 0, 1, 0, 1, 0, 1]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
specificity = tn / (tn+fp)
As I understand it, ‘specificity’ is just a special case of ‘recall’. Recall is calculated for the actual positive class ( TP / [TP+FN] ), whereas ‘specificity’ is the same type of calculation but for the actual negative class ( TN / [TN+FP] ).
It really only makes sense to have such specific terminology for binary classification problems. For a multi-class classification problem it would be more convenient to talk about recall with respect to each class. There is no reason why you can’t talk about recall in this way even when dealing with binary classification problem (e.g. recall for class 0, recall for class 1).
For example, recall tells us the proportion of patients that actual have cancer, being successfully diagnosed as having cancer. However, to generalize, you could say Class X recall tells us the proportion of samples actually belonging to Class X, being successfully predicted as belonging to Class X.
Given this, you can use from sklearn.metrics import classification_report
to produce a dictionary of the precision, recall, f1-score and support for each label/class. You can also rely on from sklearn.metrics import precision_recall_fscore_support
as well, depending on your preference. Documentation here.
from sklearn.metrics import precision_recall_fscore_support
labels = ['dog', 'cat', 'pig']
y_true = np.array(['cat', 'dog', 'pig', 'cat', 'dog', 'pig'])
y_pred = np.array(['cat', 'pig', 'dog', 'cat', 'cat', 'dog'])
prfs = precision_recall_fscore_support(y_true, y_pred, average=None, labels=labels)
precisions = prfs[0]
recalls = prfs[1] #Specificity in Binary Classification
fbeta_scores = prfs[2]
supports = prfs[3]
print(recalls) # Note the order of this array is dependent on the order of your labels array
Remembering that in binary classification, recall of the positive class is also known as “sensitivity”; recall of the negative class is “specificity”, I use this:
unique, counts = np.unique(y_test, return_counts=True)
for i in unique:
score = precision_score(y_true, y_pred, labels=unique, pos_label=i)
print('score ' + str(i) + ' ' + str(score))
As it was mentioned in the other answers, specificity is the recall of the negative class. You can reach it just setting the pos_label
parameter:
from sklearn.metrics import recall_score
y_true = [0, 1, 0, 0, 1, 0]
y_pred = [0, 0, 1, 1, 1, 1]
recall_score(y_true, y_pred, pos_label=0)
which returns .25
.
I personally rely on using classification_report
a lot from sklearn and so wanted to extend it with specificity values, so came up with the following code.
Note that I only add it to the macro avg
, though it should be easy to extend it to the weighted average output as well
import random
import numpy as np
from sklearn.metrics import classification_report
def extended_classification_report(y_true: np.array, y_pred: np.array, classes: set = None):
report = classification_report(y_true, y_pred, output_dict=True, zero_division=0)
report['macro avg']['specificity'] = specificity(y_true, y_pred, classes=classes)
return report
def specificity(y_true: np.array, y_pred: np.array, classes: set = None):
if classes is None: # Determine classes from the values
classes = set(np.concatenate((np.unique(y_true), np.unique(y_pred))))
specs = []
for cls in classes:
y_true_cls = (y_true == cls).astype(int)
y_pred_cls = (y_pred == cls).astype(int)
fp = sum(y_pred_cls[y_true_cls != 1])
tn = sum(y_pred_cls[y_true_cls == 0] == False)
specificity_val = tn / (tn + fp)
specs.append(specificity_val)
return np.mean(specs)