What exactly is the definition of a 'Module' in PyTorch?

Question:

Please excuse the novice question, but is Module just the same as saying model?

That’s what it sounds like, when the documentation says:

Whenever you want a model more complex than a simple sequence of existing Modules you will need to define your model (as a custom Module subclass).

Or… when they mention Module, are they referring to something more formal and computer-sciency, like a protocol / interface type thing?

Asked By: Monica Heddneck

||

Answers:

Without being a pytorch expert is my understanding that a module in the context of pytorch is simply a container, which takes receives tensors as input and computes tensors as output.

So, in conclusion, your model is quite likely to be composed of multiple modules, for example, you might have 3 modules each representing a layer of a neural network. Thus, they are related in the sense you need modules to actualise your model, but they aren’t the same thing.

Hope that helps

Answered By: JustDanyul

It’s a simple container.

From the docs of nn.Module

Base class for all neural network modules. Your models should also subclass this class. Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes. Submodules assigned in this way will be registered, and will have their parameters converted too when you call .cuda(), etc.

From the tutorial:

All network components should inherit from nn.Module and override the forward() method. That is about it, as far as the boilerplate is concerned. Inheriting from nn.Module provides functionality to your component. For example, it makes it keep track of its trainable parameters, you can swap it between CPU and GPU with the .to(device) method, where device can be a CPU device torch.device(“cpu”) or CUDA device torch.device(“cuda:0”).

A module is a container from which layers, model subparts (e.g. BasicBlock in resnet in torchvision) and models should inherit. Why should they? Because the inheritance from nn.Module allows you to call methods like to("cuda:0"), .eval(), .parameters() or register hooks easily.

  • why not just call the ‘module’ a model, and call the layers ‘layers’? I suppose maybe it’s just semantics and splitting hairs, but still…

That’s an API design choice and I find having only a Module class instead of two separate Model and Layers to be cleaner and to allow more freedom (it’s easier to send just a part of the model to GPU, to get parameters only for some layers…).

Answered By: iacolippo

why not just call the ‘module’ a model, and call the layers ‘layers’?

This is by inheritance, since PyTorch inherited Torch originally written in Lua, and in there they called it module.

What exactly is the definition of a ‘Module’ in PyTorch?

There are different kinds of definitions in general.

Here is one pragmatic:

  • A module is something that has a structure and runs forward trough that structure to get the output (return value).

This one is structural:

  • Module also knows the state, since you can ask to provide you the list of parameters: module.parameters().

This one is functional:

  • Module can call module.zero_grad() to set gradients of all parameters inside to zero. This is something we should do after every backprop step. This shows module also has to deal with backprop which is the step when parameters marked for update will be updated.

Module parameters marked for update have requires_grad=True like this:

Parameter containing:
tensor([-0.4411, -0.2094, -0.5322, -0.0154, -0.1009], requires_grad=True)

You can say parameters are just like tensors except they have an attribute requires_grad where you can decide should they update during backprop or no.

Finally, back to forward step to get an important note:

class ZebraNet(nn.Module):

    def __init__(self, num_classes=1000):
        super(self).__init__()
        self.convpart = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpooling = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.convpart(x)
        x = self.avgpooling(x)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)
        return x

You see how the structure is set in __init__ and how forward() will tell you what will happen with the input x and what will be returned. This return value will have the dimension of the output we need. Based on how precise we are predicting the output we have worse or better accuracy, which is usually our metric to track our progress.

Answered By: prosti

why not just call the ‘module’ a model, and call the layers ‘layers’?

Recall in data structure course, you define binary tree like this

class tree:
    def __init__(self, value, left, right):
        self.value = value
        self.left = left
        self.right = right

you can add sub tree or leaf to a tree to form a new tree, just like you can add sub module to module to form a new module(you don’t want to sub tree and tree two different data structure, you don’t want leaf and tree two different data structure, because after all they are all tree, you want to use module to represent both model and layers … think it as recursive, it is a API design choice to make things clean.)

What exactly is the definition of a ‘Module’ in PyTorch?

I would like to think module as something takes input and output something, just like a function… that’s what forward method in module class do(specify what the function is), and you need to overwrite default forward method because otherwise pytorch would not know what the function is…

def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

Another example is nn.sequential, it is also a module, but a special one, it takes a list a of module and chains the input and output of these modules together.

nn.sequential(a, b c) # a->b->c

that’s why you do not need to specify a forward method, because it is specified implicitly(just take output of a former module and feed to next module).

Another example is conv2d, it is also a module, and its forward method is also defined already so you don’t need to specify it…

class _ConvNd(Module):
    # omit 
class Conv2d(_ConvNd):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1,

                 padding=0, dilation=1, groups=1,

                 bias=True, padding_mode='zeros'):

        kernel_size = _pair(kernel_size)

        stride = _pair(stride)

        padding = _pair(padding)

        dilation = _pair(dilation)

        super(Conv2d, self).__init__(

            in_channels, out_channels, kernel_size, stride, padding, dilation,

            False, _pair(0), groups, bias, padding_mode)



    def conv2d_forward(self, input, weight):

        if self.padding_mode == 'circular':

            expanded_padding = ((self.padding[1] + 1) // 2, self.padding[1] // 2,

                                (self.padding[0] + 1) // 2, self.padding[0] // 2)

            return F.conv2d(F.pad(input, expanded_padding, mode='circular'),

                            weight, self.bias, self.stride,

                            _pair(0), self.dilation, self.groups)

        return F.conv2d(input, weight, self.bias, self.stride,

                        self.padding, self.dilation, self.groups)



    def forward(self, input):

        return self.conv2d_forward(input, self.weight)

also if anyone wonder how pytorch builds a graph and do back propagation…

check this out… (plz do not take this code seriously since I am not sure if this is how pytorch implement… but take the idea with you, it may help you understand how pytorch works)

some silly code
Hope this helps 🙂

PS, I am new to deep learning and pytorch. It’s likely this may contain some mistakes, read carefully…

Answered By: cjkkkk
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.