Label Smoothing in PyTorch

Question:

I’m building a ResNet-18 classification model for the Stanford Cars dataset using transfer learning. I would like to implement label smoothing to penalize overconfident predictions and improve generalization.

TensorFlow has a simple keyword argument in CrossEntropyLoss. Has anyone built a similar function for PyTorch that I could plug-and-play with?

Asked By: Jared Nielsen

||

Answers:

None that I know of.

Here are two examples of PyTorch implementation:

Answered By: Jindřich

I’ve been looking for options that derives from _Loss like other loss classes in PyTorch and respects basic parameters such as reduction. Unfortunately I can’t find straight forward replacement so ended up writing my own. I haven’t fully tested this yet, however:

import torch
from torch.nn.modules.loss import _WeightedLoss
import torch.nn.functional as F

class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    @staticmethod
    def _smooth_one_hot(targets:torch.Tensor, n_classes:int, smoothing=0.0):
        assert 0 <= smoothing < 1
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                    device=targets.device) 
                .fill_(smoothing /(n_classes-1)) 
                .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
        return targets

    def forward(self, inputs, targets):
        targets = SmoothCrossEntropyLoss._smooth_one_hot(targets, inputs.size(-1),
            self.smoothing)
        lsm = F.log_softmax(inputs, -1)

        if self.weight is not None:
            lsm = lsm * self.weight.unsqueeze(0)

        loss = -(targets * lsm).sum(-1)

        if  self.reduction == 'sum':
            loss = loss.sum()
        elif  self.reduction == 'mean':
            loss = loss.mean()

        return loss

Other options:

Answered By: Shital Shah

label smoothing PyTorch implementation
Ref: https://github.com/wangleiofficial/label-smoothing-pytorch

import torch.nn.functional as F

def linear_combination(x, y, epsilon):
    return epsilon * x + (1 - epsilon) * y


def reduce_loss(loss, reduction='mean'):
    return loss.mean() if reduction == 'mean' else loss.sum() if reduction == 'sum' else loss


class LabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, epsilon: float = 0.1, reduction='mean'):
        super().__init__()
        self.epsilon = epsilon
        self.reduction = reduction

    def forward(self, preds, target):
        n = preds.size()[-1]
        log_preds = F.log_softmax(preds, dim=-1)
        loss = reduce_loss(-log_preds.sum(dim=-1), self.reduction)
        nll = F.nll_loss(log_preds, target, reduction=self.reduction)
        return linear_combination(loss / n, nll, self.epsilon)
Answered By: neo lei

The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation, and speech recognition.


Label Smoothing is already implemented in Tensorflow within the cross-entropy loss functions. BinaryCrossentropy, CategoricalCrossentropy. But currently, there is no official implementation of Label Smoothing in PyTorch. However, there is going an active discussion on it and hopefully, it will be provided with an official package. Here is that discussion thread: Issue #7455.

Here We will bring some available best implementation of Label Smoothing (LS) from PyTorch practitioner. Basically, there are many ways to implement the LS. Please refer to this specific discussion on this, one is here, and another here. Here we will bring implementation in 2 unique ways with two versions of each; so total 4.

Option 1: CrossEntropyLossWithProbs

In this way, it accepts the one-hot target vector. The user must manually smooth their target vector. And it can be done within with torch.no_grad() scope, as it temporarily sets all of the requires_grad flags to false.

  1. Devin Yang: Source
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss


class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1, weight = None):
        """if smoothing == 0, it's one-hot method
           if 0 < smoothing < 1, it's smooth method
        """
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.weight = weight
        self.cls = classes
        self.dim = dim

    def forward(self, pred, target):
        assert 0 <= self.smoothing < 1
        pred = pred.log_softmax(dim=self.dim)

        if self.weight is not None:
            pred = pred * self.weight.unsqueeze(0)   

        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

Additionally, we’ve added an assertion checkmark on self. smoothing and added loss weighting support on this implementation.

  1. Shital Shah: Source

Shital already posted the answer here. Here we’re pointing out that this implementation is similar to Devin Yang‘s above implementation. However, here we’re mentioning his code with minimizing a bit of code syntax.

class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0):
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                                  device=targets.device) 
                                  .fill_(smoothing /(n_classes-1)) 
                                  .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
        return targets

    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() 
        if self.reduction == 'sum' else loss

    def forward(self, inputs, targets):
        assert 0 <= self.smoothing < 1

        targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing)
        log_preds = F.log_softmax(inputs, -1)

        if self.weight is not None:
            log_preds = log_preds * self.weight.unsqueeze(0)

        return self.reduce_loss(-(targets * log_preds).sum(dim=-1))

Check

import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss


if __name__=="__main__":
    # 1. Devin Yang
    crit = LabelSmoothingLoss(classes=5, smoothing=0.5)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

    # 2. Shital Shah
    crit = SmoothCrossEntropyLoss(smoothing=0.5)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

tensor(1.4178)
tensor(1.4178)

Option 2: LabelSmoothingCrossEntropyLoss

By this, it accepts the target vector and uses doesn’t manually smooth the target vector, rather the built-in module takes care of the label smoothing. It allows us to implement label smoothing in terms of F.nll_loss.

(a). Wangleiofficial: Source – (AFAIK), Original Poster

(b). Datasaurus: Source – Added Weighting Support

Further, we slightly minimize the coding write-up to make it more concise.

class LabelSmoothingLoss(torch.nn.Module):
    def __init__(self, smoothing: float = 0.1, 
                 reduction="mean", weight=None):
        super(LabelSmoothingLoss, self).__init__()
        self.smoothing   = smoothing
        self.reduction = reduction
        self.weight    = weight

    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() 
         if self.reduction == 'sum' else loss

    def linear_combination(self, x, y):
        return self.smoothing * x + (1 - self.smoothing) * y

    def forward(self, preds, target):
        assert 0 <= self.smoothing < 1

        if self.weight is not None:
            self.weight = self.weight.to(preds.device)

        n = preds.size(-1)
        log_preds = F.log_softmax(preds, dim=-1)
        loss = self.reduce_loss(-log_preds.sum(dim=-1))
        nll = F.nll_loss(
            log_preds, target, reduction=self.reduction, weight=self.weight
        )
        return self.linear_combination(loss / n, nll)
  1. NVIDIA/DeepLearningExamples: Source
class LabelSmoothing(nn.Module):
    """NLL loss with label smoothing.
    """
    def __init__(self, smoothing=0.0):
        """Constructor for the LabelSmoothing module.
        :param smoothing: label smoothing factor
        """
        super(LabelSmoothing, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing

    def forward(self, x, target):
        logprobs = torch.nn.functional.log_softmax(x, dim=-1)
        nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
        nll_loss = nll_loss.squeeze(1)
        smooth_loss = -logprobs.mean(dim=-1)
        loss = self.confidence * nll_loss + self.smoothing * smooth_loss
        return loss.mean()

Check

if __name__=="__main__":
    # Wangleiofficial
    crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean")
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])

    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

    # NVIDIA
    crit = LabelSmoothing(smoothing=0.3)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

tensor(1.3883)
tensor(1.3883)

Update: Officially Added

torch.nn.CrossEntropyLoss(weight=None, size_average=None, 
                          ignore_index=- 100, reduce=None, 
                          reduction='mean', label_smoothing=0.0)
Answered By: M.Innat

This currently has no official implementation in PyTorch, but has been proposed both as a high priority Feature Request #7455, and separately in TorchVision Issue #2980.


There are a number of implementations in other libraries:

As well as a number of unofficial implementations/code snippets:


TensorFlow / Keras implementation
tf.keras.losses.CategoricalCrossentropy(label_smoothing)

Answered By: iacob

From version 1.10.0 Pytorch officially supports label smoothing and soft targets in torch.nn.CrossEntropyLoss.

Answered By: tnss913