PyTorch and Neural Networks: How many parameters in a layer?
Question:
Ive seen many sources talk about the number of parameters in a neural network and mention that it is calculated as:
num parameters = ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters)
but I’ve been having trouble understanding how that applies to networks created using nn from torch
for example how many parameters would this network have?
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
Answers:
The object nn.Linear
represents a matrix with dimention [m, n].
For example, nn.Linear(28*28, 512)
has (28*28)*512
parameters(weights).
Check here for more information about it.
The object nn.Flatten()
and nn.ReLU()
do not contain parameters.
edit: we do not consider bias in the linear layer in this case.
PyTorch has a built in function that you can use to print the summary of the network, which includes how your network is structured, the number of parameters, and the total size.
In your case, you can print it as,
from torchsummary import summary
net = NeuralNetwork()
summary(net, (1, 28, 28))
In addition to Pathi_rao‘s answer, the summary function from torchsummary module has a device parameter with cuda
as it’s default value, you need to change the device to cpu summary(net, (1,28,28), device='cpu')
or change the model device to cuda summary(net.to('cuda'), (1,28,28))
else it will raise the below RuntimeError
The formula you give is valid for convolution filters of image shapes in CNN layers.
Think of the data being passed around as fibers of filter values over each point of the image grid. Then the action of an 1×1 filter is to augment the input fibers with the constant component $1$ and apply a matrix-vector multiplication. If the filter is larger, then this operation is repeated for the matrix over every point of the filter. Thus the number of coefficients is
num_coefficients = filter_size * matrix_size
with
filter_size = filter_width * filter_height
matrix_size = (in_fiber_dim + 1) * out_fiber_dim
in_fiber_dim = number of filters in previous layer
out_fiber_dim = number of filters in current layer
Ive seen many sources talk about the number of parameters in a neural network and mention that it is calculated as:
num parameters = ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters)
but I’ve been having trouble understanding how that applies to networks created using nn from torch
for example how many parameters would this network have?
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
The object nn.Linear
represents a matrix with dimention [m, n].
For example, nn.Linear(28*28, 512)
has (28*28)*512
parameters(weights).
Check here for more information about it.
The object nn.Flatten()
and nn.ReLU()
do not contain parameters.
edit: we do not consider bias in the linear layer in this case.
PyTorch has a built in function that you can use to print the summary of the network, which includes how your network is structured, the number of parameters, and the total size.
In your case, you can print it as,
from torchsummary import summary
net = NeuralNetwork()
summary(net, (1, 28, 28))
In addition to Pathi_rao‘s answer, the summary function from torchsummary module has a device parameter with cuda
as it’s default value, you need to change the device to cpu summary(net, (1,28,28), device='cpu')
or change the model device to cuda summary(net.to('cuda'), (1,28,28))
else it will raise the below RuntimeError
The formula you give is valid for convolution filters of image shapes in CNN layers.
Think of the data being passed around as fibers of filter values over each point of the image grid. Then the action of an 1×1 filter is to augment the input fibers with the constant component $1$ and apply a matrix-vector multiplication. If the filter is larger, then this operation is repeated for the matrix over every point of the filter. Thus the number of coefficients is
num_coefficients = filter_size * matrix_size
with
filter_size = filter_width * filter_height
matrix_size = (in_fiber_dim + 1) * out_fiber_dim
in_fiber_dim = number of filters in previous layer
out_fiber_dim = number of filters in current layer