What does model.train() do in PyTorch?
Question:
Does it call forward()
in nn.Module
? I thought when we call the model, forward
method is being used.
Why do we need to specify train()?
Answers:
model.train()
tells your model that you are training the model. This helps inform layers such as Dropout and BatchNorm, which are designed to behave differently during training and evaluation. For instance, in training mode, BatchNorm updates a moving average on each new batch; whereas, for evaluation mode, these updates are frozen.
More details:
model.train()
sets the mode to train
(see source code). You can call either model.eval()
or model.train(mode=False)
to tell that you are testing.
It is somewhat intuitive to expect train
function to train model but it does not do that. It just sets the mode.
There are two ways of letting the model know your intention i.e do you want to train the model or do you want to use the model to evaluate.
In case of model.train()
the model knows it has to learn the layers and when we use model.eval()
it indicates the model that nothing new is to be learnt and the model is used for testing.
model.eval()
is also necessary because in pytorch if we are using batchnorm and during test if we want to just pass a single image, pytorch throws an error if model.eval()
is not specified.
Here is the code for nn.Module.train()
:
def train(self, mode=True):
r"""Sets the module in training mode."""
self.training = mode
for module in self.children():
module.train(mode)
return self
Here is the code for nn.Module.eval()
:
def eval(self):
r"""Sets the module in evaluation mode."""
return self.train(False)
By default, the self.training
flag is set to True
, i.e., modules are in train mode by default. When self.training
is False
, the module is in the opposite state, eval mode.
Of the most commonly used layers, only Dropout
and BatchNorm
care about that flag.
The current official documentation states the following:
This has any [sic] effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.
model.train()
model.eval()
Sets model in training mode i.e.
• BatchNorm
layers use per-batch statistics
• Dropout
layers activated etc
Sets model in evaluation (inference) mode i.e.
• BatchNorm
layers use running statistics
• Dropout
layers de-activated etc
Equivalent to model.train(False)
.
Note: neither of these function calls run forward / backward passes. They tell the model how to act when run.
This is important as some modules (layers) (e.g. Dropout
, BatchNorm
) are designed to behave differently during training vs inference, and hence the model will produce unexpected results if run in the wrong mode.
Consider the following model
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class GraphNet(torch.nn.Module):
def __init__(self, num_node_features, num_classes):
super(GraphNet, self).__init__()
self.conv1 = GCNConv(num_node_features, 16)
self.conv2 = GCNConv(16, num_classes)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.dropout(x, training=self.training) #Look here
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
Here, the functioning of dropout
differ in different modes of operation. As you can see, it works only when self.training==True
. So, when you type model.train()
, the model’s forward function will perform dropout otherwise it will not (say when model.eval()
or model.train(mode=False)
).
Does it call forward()
in nn.Module
? I thought when we call the model, forward
method is being used.
Why do we need to specify train()?
model.train()
tells your model that you are training the model. This helps inform layers such as Dropout and BatchNorm, which are designed to behave differently during training and evaluation. For instance, in training mode, BatchNorm updates a moving average on each new batch; whereas, for evaluation mode, these updates are frozen.
More details:
model.train()
sets the mode to train
(see source code). You can call either model.eval()
or model.train(mode=False)
to tell that you are testing.
It is somewhat intuitive to expect train
function to train model but it does not do that. It just sets the mode.
There are two ways of letting the model know your intention i.e do you want to train the model or do you want to use the model to evaluate.
In case of model.train()
the model knows it has to learn the layers and when we use model.eval()
it indicates the model that nothing new is to be learnt and the model is used for testing.
model.eval()
is also necessary because in pytorch if we are using batchnorm and during test if we want to just pass a single image, pytorch throws an error if model.eval()
is not specified.
Here is the code for nn.Module.train()
:
def train(self, mode=True):
r"""Sets the module in training mode."""
self.training = mode
for module in self.children():
module.train(mode)
return self
Here is the code for nn.Module.eval()
:
def eval(self):
r"""Sets the module in evaluation mode."""
return self.train(False)
By default, the self.training
flag is set to True
, i.e., modules are in train mode by default. When self.training
is False
, the module is in the opposite state, eval mode.
Of the most commonly used layers, only Dropout
and BatchNorm
care about that flag.
The current official documentation states the following:
This has any [sic] effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.
model.train() |
model.eval() |
---|---|
Sets model in training mode i.e.
• |
Sets model in evaluation (inference) mode i.e.
• |
Equivalent to model.train(False) . |
Note: neither of these function calls run forward / backward passes. They tell the model how to act when run.
This is important as some modules (layers) (e.g. Dropout
, BatchNorm
) are designed to behave differently during training vs inference, and hence the model will produce unexpected results if run in the wrong mode.
Consider the following model
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class GraphNet(torch.nn.Module):
def __init__(self, num_node_features, num_classes):
super(GraphNet, self).__init__()
self.conv1 = GCNConv(num_node_features, 16)
self.conv2 = GCNConv(16, num_classes)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.dropout(x, training=self.training) #Look here
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
Here, the functioning of dropout
differ in different modes of operation. As you can see, it works only when self.training==True
. So, when you type model.train()
, the model’s forward function will perform dropout otherwise it will not (say when model.eval()
or model.train(mode=False)
).