How the probabilities are normalized in one-vs-rest scheme of sklearn Logistic Regression?
Question:
In the sklearn LogisticRegression classifer, we can set the muti_class
option to ovr
which stands for one-vs-rest, as in the following code snippet:
# logistic regression for multi-class classification using built-in one-vs-rest
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)
# define model
model = LogisticRegression(multi_class='ovr')
# fit model
model.fit(X, y)
Now, this classifier can assign probabilities to different classes for given instances:
# make predictions
yhat = model.predict_proba(X)
The probabilities sum to 1 for each instance:
array([[0.16973178, 0.46755188, 0.36271634],
[0.58228627, 0.0928127 , 0.32490103],
[0.28241256, 0.51175978, 0.20582766],
...,
[0.17922774, 0.71300755, 0.10776471],
[0.05888508, 0.24924809, 0.69186683],
[0.25808835, 0.68599321, 0.05591844]])
My question: In the one-vs-rest method, a classifier is trained for each class. Therefore, we expect a probability for each class independent from other classes. How the probabilities are normalized to sum to 1?
Answers:
The probabilities are normalized by dividing by the row sum (i.e. the sum of the class probabilities for each sample), this is the source code:
prob /= prob.sum(axis=1).reshape((prob.shape[0], -1))
The code below shows how to use this formula to replicate the model outputs:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# generate some data
X, y = make_classification(n_classes=3, n_features=10, n_informative=5, n_redundant=5, n_samples=1000, random_state=1)
# fit the model
model = LogisticRegression(multi_class='ovr')
model.fit(X, y)
prob_pred = model.predict_proba(X)
print(prob_pred)
# [[0.16973178 0.46755188 0.36271634]
# [0.58228627 0.0928127 0.32490103]
# [0.28241256 0.51175978 0.20582766]
# ...
class_pred = model.predict(X)
print(class_pred)
# [1 0 1 2 0 2 1 2 0 1 1 0 2 1 0 1 2 0 1 0 ...
# replicate the model's outputs
classes = np.unique(y)
n_classes = len(classes)
n_samples = len(y)
prob_pred = np.zeros((n_samples, n_classes))
class_pred = np.zeros(n_samples)
for c in classes:
y_ = np.where(y == c, 1, 0)
model = LogisticRegression()
model.fit(X, y_)
prob_pred[:, c] = model.predict_proba(X)[:, 1]
prob_pred /= prob_pred.sum(axis=1).reshape((prob_pred.shape[0], -1))
print(prob_pred)
# [[0.16973178 0.46755188 0.36271634]
# [0.58228627 0.0928127 0.32490103]
# [0.28241256 0.51175978 0.20582766]
# ...
class_pred = classes[np.argmax(prob_pred, axis=1)]
print(class_pred)
# [1 0 1 2 0 2 1 2 0 1 1 0 2 1 0 1 2 0 1 0 ...
As you can see here,
multiclass is handled by normalizing the score of each class for the instance x over all classes as follows: the estimated probability that the
instance belongs to class k is given by
f representing the decision function, K the number of classes.
In the sklearn LogisticRegression classifer, we can set the muti_class
option to ovr
which stands for one-vs-rest, as in the following code snippet:
# logistic regression for multi-class classification using built-in one-vs-rest
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)
# define model
model = LogisticRegression(multi_class='ovr')
# fit model
model.fit(X, y)
Now, this classifier can assign probabilities to different classes for given instances:
# make predictions
yhat = model.predict_proba(X)
The probabilities sum to 1 for each instance:
array([[0.16973178, 0.46755188, 0.36271634],
[0.58228627, 0.0928127 , 0.32490103],
[0.28241256, 0.51175978, 0.20582766],
...,
[0.17922774, 0.71300755, 0.10776471],
[0.05888508, 0.24924809, 0.69186683],
[0.25808835, 0.68599321, 0.05591844]])
My question: In the one-vs-rest method, a classifier is trained for each class. Therefore, we expect a probability for each class independent from other classes. How the probabilities are normalized to sum to 1?
The probabilities are normalized by dividing by the row sum (i.e. the sum of the class probabilities for each sample), this is the source code:
prob /= prob.sum(axis=1).reshape((prob.shape[0], -1))
The code below shows how to use this formula to replicate the model outputs:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
# generate some data
X, y = make_classification(n_classes=3, n_features=10, n_informative=5, n_redundant=5, n_samples=1000, random_state=1)
# fit the model
model = LogisticRegression(multi_class='ovr')
model.fit(X, y)
prob_pred = model.predict_proba(X)
print(prob_pred)
# [[0.16973178 0.46755188 0.36271634]
# [0.58228627 0.0928127 0.32490103]
# [0.28241256 0.51175978 0.20582766]
# ...
class_pred = model.predict(X)
print(class_pred)
# [1 0 1 2 0 2 1 2 0 1 1 0 2 1 0 1 2 0 1 0 ...
# replicate the model's outputs
classes = np.unique(y)
n_classes = len(classes)
n_samples = len(y)
prob_pred = np.zeros((n_samples, n_classes))
class_pred = np.zeros(n_samples)
for c in classes:
y_ = np.where(y == c, 1, 0)
model = LogisticRegression()
model.fit(X, y_)
prob_pred[:, c] = model.predict_proba(X)[:, 1]
prob_pred /= prob_pred.sum(axis=1).reshape((prob_pred.shape[0], -1))
print(prob_pred)
# [[0.16973178 0.46755188 0.36271634]
# [0.58228627 0.0928127 0.32490103]
# [0.28241256 0.51175978 0.20582766]
# ...
class_pred = classes[np.argmax(prob_pred, axis=1)]
print(class_pred)
# [1 0 1 2 0 2 1 2 0 1 1 0 2 1 0 1 2 0 1 0 ...
As you can see here,
multiclass is handled by normalizing the score of each class for the instance x over all classes as follows: the estimated probability that the
instance belongs to class k is given by
f representing the decision function, K the number of classes.