how does sklearn compute the Accuracy score step by step?

Question:

I was reading about the metrics used in sklearn but I find pretty confused the following:

enter image description here

In the documentation sklearn provides a example of its usage as follows:

import numpy as np
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)
0.5

I understood that sklearns computes that metric as follows:

enter image description here

I am not sure about the process, I would like to appreciate if some one could explain more this result step by step since I was studying it but I found hard to understand, In order to understand more I tried the following case:

import numpy as np
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3,0]
y_true = [0, 1, 2, 3,0]
print(accuracy_score(y_true, y_pred))
0.6

And I supposed that the correct computation would be the following:

enter image description here

but I am not sure about it, I would like to see if someone could support me with the computation rather than copy and paste the sklearn’s documentation.

I have the doubt if the i in the sumatory is the same as the i in the formula inside the parenthesis, it is unclear to me, I don’t know if the number of elements in the sumatory is related just to the number of elements in the sample of if it depends on also by the number of classes.

Asked By: neo33

||

Answers:

The indicator function equals one only if the variables in its arguments are equal, else it’s value is zero. Therefor when y is equal to yhat the indicator function produces a one counting as a correct classification. There is a code example in python and numerical example below.

import numpy as np
yhat=np.array([0,2,1,3])
y=np.array([0,1,2,3])
acc=np.mean(y==yhat)
print( acc)

example

A simple way to understand the calculation of the accuracy is:

Given two lists, y_pred and y_true, for every position index i, compare the i-th element of y_pred with the i-th element of y_true and perform the following calculation:

  1. Count the number of matches
  2. Divide it by the number of samples

So using your own example:

y_pred = [0, 2, 1, 3, 0]
y_true = [0, 1, 2, 3, 0]

We see matches on indices 0, 3 and 4. Thus:

number of matches = 3
number of samples = 5

Finally, the accuracy calculation:

accuracy = matches/samples
accuracy = 3/5
accuracy = 0.6

And for your question about the i index, it is the sample index, so it is the same for both the summation index and the Y/Yhat index.

Answered By: Rabbit
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.