Loading pre-trained weights properly in Pytorch


I would like to perform transfer learning by loading a pretrained vision transformer model, modify its last layer and training it with my own data.

Hence, I am loading my dataset perform the typical transformation similar to the ImageNet, then, load the model, disable the grad from all its layer remove the last layer and add a trainable one using the number of classes of my dataset. My code could look like as follows:

#retrained_vit_weights = torchvision.models.ViT_B_16_Weights.DEFAULT # requires torchvision >= 0.13, "DEFAULT" means best available
#pretrained_vit = torchvision.models.vit_b_16(weights=pretrained_vit_weights).to(device)
pretrained_vit = torch.hub.load('facebookresearch/deit:main', 'deit_tiny_patch16_224', pretrained=True).to(device)

for parameter in pretrained_vit.parameters():
    parameter.requires_grad = False

pretrained_vit.heads = nn.Linear(in_features=192, out_features=len(class_names)).to(device)
optimizer(torch.optim.Adam(params=pretrained_vit.parameters(), ... )
loss_fn = torch.nn.CrossEntropyLoss()

esults = engine.train(model=pretrained_vit, ..., ... )

When I am using torchvision.models.ViT_B_16_Weights.DEFAULT then the code works smoothly and I can run my code without any problem. However, when I am using instead the deit_tiny_patch16_224 and I set the requires_grade = False then I got the following error:

Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

When the varialble is set to True, the code works smoothly but ofc the training is really bad since I had a very small amount of pictures. How, can I set properly the deit_tiny_patch16_224 parametes to parameter.requires_grad = False?

Is there an issue with the way I am loading the pre-trained weights?

Asked By: Jose Ramon



If you look at the model description by printing it, you will see the fully connected classifier layer as a key name of "head", not "heads". The following code works on my end:

for parameter in pretrained_vit.parameters():
    parameter.requires_grad = False
pretrained_vit.head = nn.Linear(in_features=192, out_features=10)

I recommend using nn.Module.requires_grad_ instead of setting the attribute yourself on each tensor parameter. Keep in mind, with your current code, the whole model will be frozen, including the classifier layer, as such you might want to unfreeze that layer:

pretrained_vit.head = nn.Linear(in_features=192, out_features=10)
Answered By: Ivan