PyTorch and Neural Networks: How many parameters in a layer?

Question:

Ive seen many sources talk about the number of parameters in a neural network and mention that it is calculated as:

num parameters = ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters)

but I’ve been having trouble understanding how that applies to networks created using nn from torch

for example how many parameters would this network have?

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
Asked By: JoshAsh

||

Answers:

The object nn.Linear represents a matrix with dimention [m, n].
For example, nn.Linear(28*28, 512) has (28*28)*512 parameters(weights).
Check here for more information about it.

The object nn.Flatten() and nn.ReLU() do not contain parameters.

edit: we do not consider bias in the linear layer in this case.

Answered By: Ground Zero

PyTorch has a built in function that you can use to print the summary of the network, which includes how your network is structured, the number of parameters, and the total size.

In your case, you can print it as,

from torchsummary import summary

net = NeuralNetwork()
summary(net, (1, 28, 28))
Answered By: Pathi_rao

In addition to Pathi_rao‘s answer, the summary function from torchsummary module has a device parameter with cuda as it’s default value, you need to change the device to cpu summary(net, (1,28,28), device='cpu') or change the model device to cuda summary(net.to('cuda'), (1,28,28)) else it will raise the below RuntimeError enter image description here

Answered By: Okafor Chukwuka

The formula you give is valid for convolution filters of image shapes in CNN layers.

Think of the data being passed around as fibers of filter values over each point of the image grid. Then the action of an 1×1 filter is to augment the input fibers with the constant component $1$ and apply a matrix-vector multiplication. If the filter is larger, then this operation is repeated for the matrix over every point of the filter. Thus the number of coefficients is

num_coefficients = filter_size * matrix_size

with

filter_size = filter_width * filter_height
matrix_size = (in_fiber_dim + 1) * out_fiber_dim

in_fiber_dim = number of filters in previous layer
out_fiber_dim = number of filters in current layer
Answered By: Lutz Lehmann