Pytorch: how to add L1 regularizer to activations?
Question:
I would like to add the L1 regularizer to the activations output from a ReLU.
More generally, how does one add a regularizer only to a particular layer in the network?
^{}
Related material:

This similar post refers to adding L2 regularization, but it appears to add the regularization penalty to all layers of the network.

nn.modules.loss.L1Loss()
seems relevant, but I do not yet understand how to use this. 
The legacy module
L1Penalty
seems relevant also, but why has it been deprecated?
Answers:
Here is how you do this:
 In your Module’s forward return final output and layers’ output for which you want to apply L1 regularization
loss
variable will be sum of cross entropy loss of output w.r.t. targets and L1 penalties.
Here’s an example code
import torch
from torch.autograd import Variable
from torch.nn import functional as F
class MLP(torch.nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.linear1 = torch.nn.Linear(128, 32)
self.linear2 = torch.nn.Linear(32, 16)
self.linear3 = torch.nn.Linear(16, 2)
def forward(self, x):
layer1_out = F.relu(self.linear1(x))
layer2_out = F.relu(self.linear2(layer1_out))
out = self.linear3(layer2_out)
return out, layer1_out, layer2_out
batchsize = 4
lambda1, lambda2 = 0.5, 0.01
model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e4)
# usually following code is looped over all batches
# but let's just do a dummy batch for brevity
inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())
optimizer.zero_grad()
outputs, layer1_out, layer2_out = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
all_linear1_params = torch.cat([x.view(1) for x in model.linear1.parameters()])
all_linear2_params = torch.cat([x.view(1) for x in model.linear2.parameters()])
l1_regularization = lambda1 * torch.norm(all_linear1_params, 1)
l2_regularization = lambda2 * torch.norm(all_linear2_params, 2)
loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
@Sasank Chilamkurthy
Regularization should be the weighting parameter of each layer of the model, not the output of each layer. please look below:
Regularization
import torch
from torch.autograd import Variable
from torch.nn import functional as F
class MLP(torch.nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.linear1 = torch.nn.Linear(128, 32)
self.linear2 = torch.nn.Linear(32, 16)
self.linear3 = torch.nn.Linear(16, 2)
def forward(self, x):
layer1_out = F.relu(self.linear1(x))
layer2_out = F.relu(self.linear2(layer1_out))
out = self.linear3(layer2_out)
return out
batchsize = 4
lambda1, lambda2 = 0.5, 0.01
model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e4)
inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())
l1_regularization, l2_regularization = torch.tensor(0), torch.tensor(0)
optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
for param in model.parameters():
l1_regularization += torch.norm(param, 1)**2
l2_regularization += torch.norm(param, 2)**2
loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
I think the original post wants to regularize the output from ReLU, so the regularizer should be on the output, not the weights of the network. They are not the same!

with l1norm regularize the weights is training a neural network has sparse weights

with l1norm regularize the output of a layer is training a network has a sparse output of this certain layer.
Either these above answers (including the accepted one) missed the point, or I misunderstanding the original post question.
All of the (other current) responses are incorrect in some way as the question is about adding regularization to activation. This one is closest in that it suggests summing the norms of the outputs, which is correct, but the code sums the norms of the weights, which is incorrect.
The correct way is not to modify the network code, but rather to capture the outputs via a forward hook, as in the OutputHook
class. From there, the summing of the norms of the outputs is straightforward, but one needs to take care to clear the captured outputs every iteration.
import torch
class OutputHook(list):
""" Hook to capture module outputs.
"""
def __call__(self, module, input, output):
self.append(output)
class MLP(torch.nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.linear1 = torch.nn.Linear(128, 32)
self.linear2 = torch.nn.Linear(32, 16)
self.linear3 = torch.nn.Linear(16, 2)
# Instantiate ReLU, so a hook can be registered to capture its output.
self.relu = torch.nn.ReLU()
def forward(self, x):
layer1_out = self.relu(self.linear1(x))
layer2_out = self.relu(self.linear2(layer1_out))
out = self.linear3(layer2_out)
return out
batch_size = 4
l1_lambda = 0.01
model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e4)
# Register hook to capture the ReLU outputs. Nontrivial networks will often
# require hooks to be applied more judiciously.
output_hook = OutputHook()
model.relu.register_forward_hook(output_hook)
inputs = torch.rand(batch_size, 128)
targets = torch.ones(batch_size).long()
optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = torch.nn.functional.cross_entropy(outputs, targets)
# Compute the L1 penalty over the ReLU outputs captured by the hook.
l1_penalty = 0.
for output in output_hook:
l1_penalty += torch.norm(output, 1)
l1_penalty *= l1_lambda
loss = cross_entropy_loss + l1_penalty
loss.backward()
optimizer.step()
output_hook.clear()
You can apply L1 regularization of the weights of a single layer of your model my_layer
to the loss function with the following code:
def l1_penalty(params, l1_lambda=0.001):
"""Returns the L1 penalty of the params."""
l1_norm = sum(p.abs().sum() for p in params)
return l1_lambda*l1_norm
loss = loss_fn(outputs, labels) + l1_penalty(my_layer.parameters())