In a linear regressio model using scikit, how does the confusion matrix know which is the positive class?

Question:

I am performing a cancer prediction task (where 1 is cancer case and 0 is a control). The tutorials I’ve watched never seem to indicate to the Logistic Regression model which is the positive class to eventually produce the confusion matrix.

Is it by default that the true positives will display the ‘1’s correctly predicted and vice versa?

Asked By: codinggirl123

||

Answers:

In sklearn.metrics.confusion_matrix we have a parameter called labels with default value None. The documentation of labels tells us:

List of labels to index the matrix. This may be used to reorder or
select a subset of labels. If None is given, those that appear at
least once in y_true or y_pred are used in sorted order.

So to assign proper index to your classes, pass them sequentially to labels
Say for example positive = 1, negative = 0

from sklearn.metrics import confusion_matrix as cm
>>> y_test = [1, 0, 0]
>>> y_pred = [1, 0, 0]
>>> cm(y_test, y_pred, labels=[1,0])
array([[1, 0],
       [0, 2]])

              Pred
             |  pos=1 | neg=0 |
         ___________________
Actual  pos=1|  TP=1  | FN=0 |
        neg=0|  FP=0  | TN=2 |

Note: The TP,TN,FP and FN have changed places by passing labels as [1,0]. TP means both predicted and actual value are positive. TN means both predicted and actual value are negative.Same analysis can be done for FP and FN.

If we dont pass any value to labels, the y_true and y_pred values will be used in sorted order i.e [0,1].

>>> y_test = [1, 0, 0]
>>> y_pred = [1, 0, 0]
>>> cm(y_test, y_pred)
array([[2, 0],
       [0, 1]])
                 Pred
             |  neg=0 | pos=1 |
         ___________________
Actual  neg=0|  TN=2  | FP=0 |
        pos=1|  FN=0  | TN=1 |

This will become even more clear if we use more than 2 labels. Cat=1, Dog=2, Mouse=3
If you want the order to be Cat, Mouse, and Dog then labels=[1,3,2]

>>> y_test = [1, 2, 3]
>>> y_pred = [1, 3, 2]
>>> cm(y_test, y_pred, labels=[1,3,2])
array([[1, 0, 0],
       [0, 0, 1],
       [0, 1, 0]])

                Pred
          |  1  |  3  |  2 |
          __________________
Actual  1 |   1 |  0  |  0 |
        3 |   0 |  0  |  1 |
        2 |   0 |  1  |  0 |

If you want some other order like Dog,Mouse, and Cat then labels=[2,3,1]

>>> cm(y_test, y_pred, labels=[2,3,1])
array([[0, 1, 0],
       [1, 0, 0],
       [0, 0, 1]])
 
Answered By: MSS