Pytorch: how to add L1 regularizer to activations?

Question:

I would like to add the L1 regularizer to the activations output from a ReLU.
More generally, how does one add a regularizer only to a particular layer in the network?


Related material:

  • This similar post refers to adding L2 regularization, but it appears to add the regularization penalty to all layers of the network.

  • nn.modules.loss.L1Loss() seems relevant, but I do not yet understand how to use this.

  • The legacy module L1Penalty seems relevant also, but why has it been deprecated?

Asked By: Bull

||

Answers:

Here is how you do this:

  • In your Module’s forward return final output and layers’ output for which you want to apply L1 regularization
  • loss variable will be sum of cross entropy loss of output w.r.t. targets and L1 penalties.

Here’s an example code

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)

    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out, layer1_out, layer2_out

batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

# usually following code is looped over all batches 
# but let's just do a dummy batch for brevity

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())

optimizer.zero_grad()
outputs, layer1_out, layer2_out = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)

all_linear1_params = torch.cat([x.view(-1) for x in model.linear1.parameters()])
all_linear2_params = torch.cat([x.view(-1) for x in model.linear2.parameters()])
l1_regularization = lambda1 * torch.norm(all_linear1_params, 1)
l2_regularization = lambda2 * torch.norm(all_linear2_params, 2)

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
Answered By: Sasank Chilamkurthy

@Sasank Chilamkurthy
Regularization should be the weighting parameter of each layer of the model, not the output of each layer. please look below:
Regularization

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)
    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out

batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())
l1_regularization, l2_regularization = torch.tensor(0), torch.tensor(0)

optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
for param in model.parameters():
    l1_regularization += torch.norm(param, 1)**2
    l2_regularization += torch.norm(param, 2)**2

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
Answered By: rainy

I think the original post wants to regularize the output from ReLU, so the regularizer should be on the output, not the weights of the network. They are not the same!

  • with l1-norm regularize the weights is training a neural network has sparse weights

  • with l1-norm regularize the output of a layer is training a network has a sparse output of this certain layer.

Either these above answers (including the accepted one) missed the point, or I misunderstanding the original post question.

Answered By: Tethys

All of the (other current) responses are incorrect in some way as the question is about adding regularization to activation. This one is closest in that it suggests summing the norms of the outputs, which is correct, but the code sums the norms of the weights, which is incorrect.

The correct way is not to modify the network code, but rather to capture the outputs via a forward hook, as in the OutputHook class. From there, the summing of the norms of the outputs is straightforward, but one needs to take care to clear the captured outputs every iteration.

import torch


class OutputHook(list):
    """ Hook to capture module outputs.
    """
    def __call__(self, module, input, output):
        self.append(output)


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)
        # Instantiate ReLU, so a hook can be registered to capture its output.
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        layer1_out = self.relu(self.linear1(x))
        layer2_out = self.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out


batch_size = 4
l1_lambda = 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
# Register hook to capture the ReLU outputs. Non-trivial networks will often
# require hooks to be applied more judiciously.
output_hook = OutputHook()
model.relu.register_forward_hook(output_hook)

inputs = torch.rand(batch_size, 128)
targets = torch.ones(batch_size).long()

optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = torch.nn.functional.cross_entropy(outputs, targets)

# Compute the L1 penalty over the ReLU outputs captured by the hook.
l1_penalty = 0.
for output in output_hook:
    l1_penalty += torch.norm(output, 1)
l1_penalty *= l1_lambda

loss = cross_entropy_loss + l1_penalty
loss.backward()
optimizer.step()
output_hook.clear()
Answered By: ndronen

You can apply L1 regularization of the weights of a single layer of your model my_layer to the loss function with the following code:

def l1_penalty(params, l1_lambda=0.001):
    """Returns the L1 penalty of the params."""
    l1_norm = sum(p.abs().sum() for p in params)
    return l1_lambda*l1_norm

loss = loss_fn(outputs, labels) + l1_penalty(my_layer.parameters())
Answered By: iacob
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.