Check the total number of parameters in a PyTorch model

Question:

How do I count the total number of parameters in a PyTorch model? Something similar to model.count_params() in Keras.

Asked By: Fábio Perez

||

Answers:

PyTorch doesn’t have a function to calculate the total number of parameters as Keras does, but it’s possible to sum the number of elements for every parameter group:

pytorch_total_params = sum(p.numel() for p in model.parameters())

If you want to calculate only the trainable parameters:

pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

Answer inspired by this answer on PyTorch Forums.

Answered By: Fábio Perez

If you want to calculate the number of weights and biases in each layer without instantiating the model, you can simply load the raw file and iterate over the resulting collections.OrderedDict like so:

import torch


tensor_dict = torch.load('model.dat', map_location='cpu') # OrderedDict
tensor_list = list(tensor_dict.items())
for layer_tensor_name, tensor in tensor_list:
    print('Layer {}: {} elements'.format(layer_tensor_name, torch.numel(tensor)))

You’ll get something like

conv1.weight: 312
conv1.bias: 26
batch_norm1.weight: 26
batch_norm1.bias: 26
batch_norm1.running_mean: 26
batch_norm1.running_var: 26
conv2.weight: 2340
conv2.bias: 10
batch_norm2.weight: 10
batch_norm2.bias: 10
batch_norm2.running_mean: 10
batch_norm2.running_var: 10
fcs.layers.0.weight: 135200
fcs.layers.0.bias: 260
fcs.layers.1.weight: 33800
fcs.layers.1.bias: 130
fcs.batch_norm_layers.0.weight: 260
fcs.batch_norm_layers.0.bias: 260
fcs.batch_norm_layers.0.running_mean: 260
fcs.batch_norm_layers.0.running_var: 260
Answered By: Zhanwen Chen

Another possible solution with respect

def model_summary(model):
  print("model_summary")
  print()
  print("Layer_name"+"t"*7+"Number of Parameters")
  print("="*100)
  model_parameters = [layer for layer in model.parameters() if layer.requires_grad]
  layer_name = [child for child in model.children()]
  j = 0
  total_params = 0
  print("t"*10)
  for i in layer_name:
    print()
    param = 0
    try:
      bias = (i.bias is not None)
    except:
      bias = False  
    if not bias:
      param =model_parameters[j].numel()+model_parameters[j+1].numel()
      j = j+2
    else:
      param =model_parameters[j].numel()
      j = j+1
    print(str(i)+"t"*3+str(param))
    total_params+=param
  print("="*100)
  print(f"Total Params:{total_params}")       

model_summary(net)

This would give output similar to below

model_summary

Layer_name                          Number of Parameters
====================================================================================================

Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))             60
Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))            880
Linear(in_features=576, out_features=120, bias=True)        69240
Linear(in_features=120, out_features=84, bias=True)         10164
Linear(in_features=84, out_features=10, bias=True)          850
====================================================================================================
Total Params:81194
Answered By: Shashank Nigam

To get the parameter count of each layer like Keras, PyTorch has model.named_paramters() that returns an iterator of both the parameter name and the parameter itself. Example:

from prettytable import PrettyTable

def count_parameters(model):
    table = PrettyTable(["Modules", "Parameters"])
    total_params = 0
    for name, parameter in model.named_parameters():
        if not parameter.requires_grad: continue
        params = parameter.numel()
        table.add_row([name, params])
        total_params+=params
    print(table)
    print(f"Total Trainable Params: {total_params}")
    return total_params
    
count_parameters(net)

Example output:

+-------------------+------------+
|      Modules      | Parameters |
+-------------------+------------+
| embeddings.weight |   922866   |
|    conv1.weight   |  1048576   |
|     conv1.bias    |    1024    |
|     bn1.weight    |    1024    |
|      bn1.bias     |    1024    |
|    conv2.weight   |  2097152   |
|     conv2.bias    |    1024    |
|     bn2.weight    |    1024    |
|      bn2.bias     |    1024    |
|    conv3.weight   |  2097152   |
|     conv3.bias    |    1024    |
|     bn3.weight    |    1024    |
|      bn3.bias     |    1024    |
|    lin1.weight    |  50331648  |
|     lin1.bias     |    512     |
|    lin2.weight    |   265728   |
|     lin2.bias     |    519     |
+-------------------+------------+
Total Trainable Params: 56773369
Answered By: Thong Nguyen

To avoid double counting shared parameters, use torch.Tensor.data_ptr. E.g.:

sum(dict((p.data_ptr(), p.numel()) for p in model.parameters()).values())

Here’s a more verbose implementation that can optionally filter out non-trainable parameters:

def numel(m: torch.nn.Module, only_trainable: bool = False):
    """
    Returns the total number of parameters used by `m` (only counting
    shared parameters once); if `only_trainable` is True, then only
    includes parameters with `requires_grad = True`
    """
    parameters = list(m.parameters())
    if only_trainable:
        parameters = [p for p in parameters if p.requires_grad]
    unique = {p.data_ptr(): p for p in parameters}.values()
    return sum(p.numel() for p in unique)
Answered By: teichert

You can use torchsummary to do the same thing. It’s just two lines of code.

from torchsummary import summary

print(summary(model, (input_shape)))
Answered By: Srujan2k21

There is a builtin utility function to convert an iterable of tensors into a tensor: torch.nn.utils.parameters_to_vector, then combine with torch.numel:

torch.nn.utils.parameters_to_vector(model.parameters()).numel()

Or shorter with a named import (from torch.nn.utils import parameters_to_vector):

parameters_to_vector(model.parameters()).numel()
Answered By: Ivan

As @fábio-perez mentioned, there is no such built-in function in PyTorch.

However, I found this to be a compact and neat way of achieving the same result:

num_of_parameters = sum(map(torch.numel, model.parameters()))
Answered By: A. Maman

Straight and simple

print(sum(p.numel() for p in model.parameters()))
Answered By: Prajot Kuvalekar

Final answer you can plug in:

def count_number_of_parameters(model: nn.Module, only_trainable: bool = True) -> int:
    """
    Counts the number of trainable params. If all params, specify only_trainable = False.

    Ref:
        - https://discuss.pytorch.org/t/how-do-i-check-the-number-of-parameters-of-a-model/4325/9?u=brando_miranda
        - https://stackoverflow.com/questions/49201236/check-the-total-number-of-parameters-in-a-pytorch-model/62764464#62764464
    :return:
    """
    if only_trainable:
        num_params: int = sum(p.numel() for p in model.parameters() if p.requires_grad)
    else:  # counts trainable and none-traibale
        num_params: int = sum(p.numel() for p in model.parameters() if p)
    assert num_params > 0, f'Err: {num_params=}'
    return int(num_params)
Answered By: Charlie Parker

None of the answers fully address if different parameters share memory, including the answers that use numel, PrettyTable and .data_ptr. @teichert gave a great answer that handles the case where there are two different parameters that point to the exact same tensor. But what if one parameter is a slice of another? Though they would share some memory, using.data_ptr() naively would come up with different results – thus there would still be overcounting using his approach.

To be thorough, you need to ensure that none of the entries in any of the tensors point to the same thing. This can be accomplished by using set-comprehensions:

Include non-trainable parameters:

len({e.data_ptr() for p in model.parameters() for e in p.view(-1)})

Ignore non-trainable parameters:

len({e.data_ptr() for p in model.parameters() if p.requires_grad for e in p.view(-1)})

If tensors can share memory, how about counting the number of unique tensors? This sounds like a tricky interview problem, but it’s easy with the UnionFind data structure! If you don’t want to pip install it, just copy this file verbatim as a drop in replacement.

Pass your model into this function, and it will not overcount if there is any memory sharing even if some parameters are slices of others.

def num_parameters(model, show_only_trainable):
    from UnionFind import UnionFind
    u = UnionFind()
    for p in model.parameters():
        if not show_only_trainable or p.requires_grad:
            u.union(*[e.data_ptr() for e in p.view(-1)])
    print(f'Number of parameters: {len(u)}')
    print(f'Number of tensors: {u.num_connected_components}')

This code demonstrates the problem with using the other techniques and how using the above function fixes it.

>>> import torch.nn as nn
>>> import torch
>>> torch.manual_seed(0)
>>> 
>>> # This layer is not trainable
>>> frozen_layer = nn.Linear(out_features=3, in_features=4, bias=False)
>>> for p in frozen_layer.parameters():
...     p.requires_grad = False
... 
>>> 
>>> # There are 4*2 + 3*4 = 20 total parameters
>>> # There are 4*2 = 8 trainable parameters
>>> model = nn.Sequential(
...     nn.Linear(out_features=4, in_features=2, bias=False),
...     nn.ReLU(),
...     frozen_layer,
...     nn.Sigmoid()
... )
>>> 
>>> # Parameters seem properly accounted for so far
>>> sum(dict((p.data_ptr(), p.numel()) for p in model.parameters()).values())
20
>>> sum(dict((p.data_ptr(), p.numel()) for p in model.parameters() if p.requires_grad).values())
8
>>> 
>>> # Add a new Parameter that is an arbitrary slice of an existing Parameter.
>>> # NOTE that slice syntax `[]` and wrapping with `nn.Parameter()` do
>>> # NOT copy the data, but merely point to part of existing tensor.
>>> model.newparam = nn.Parameter(next(model.parameters())[0:2, 1:2])
>>> 
>>> params = list(model.parameters())
>>> 
>>> # Notice that both appear the same. Do they share memory?
>>> # `params[0]` is `model.newparam`. `params[1]` is tensor that `params[0]` was sliced from.
>>> 
>>> params[0]
Parameter containing:
tensor([[-0.4683],
        [ 0.0262]], requires_grad=True)
>>> 
>>> params[1][0:2, 1:2]
tensor([[-0.4683],
        [ 0.0262]], grad_fn=<SliceBackward0>)
>>> 
>>> with torch.no_grad():
...     params[0][0, 0] = 1.2345
... 
>>> 
>>> # Both have changed, proving that they DO share memory.
>>> 
>>> params[0]
Parameter containing:
tensor([[1.2345],
        [0.0262]], requires_grad=True)
>>> 
>>> params[1][0:2, 1:2]
tensor([[1.2345],
        [0.0262]], grad_fn=<SliceBackward0>)
>>> 
>>> # WRONG - the number of parameters "appears" to have increased by 2 (because of `model.newparam`).
>>> sum(dict((p.data_ptr(), p.numel()) for p in model.parameters()).values())
22
>>> sum(dict((p.data_ptr(), p.numel()) for p in model.parameters() if p.requires_grad).values())
10
>>> 
>>> # CORRECT - this discounts all shared parameters
>>> len({e.data_ptr() for p in model.parameters() for e in p.view(-1)})
20
>>> len({e.data_ptr() for p in model.parameters() if p.requires_grad for e in p.view(-1)})
8
>>> 
>>> # To count unique tensors, we can use this function.
>>> # It utilizes the UnionFind data structure which can be dropped in directly from here:
>>> # https://gist.github.com/timgianitsos/0878a0b241cb5d0ad8b16ebc2b14322a
>>> def num_parameters(model, show_only_trainable):
...     from UnionFind import UnionFind
...     u = UnionFind()
...     for p in model.parameters():
...         if not show_only_trainable or p.requires_grad:
...             u.union(*[e.data_ptr() for e in p.view(-1)])
...     print(f'Number of parameters: {len(u)}')
...     print(f'Number of tensors: {u.num_connected_components}')
... 
>>> 
>>> # Notice that the problem has been fixed
>>> num_parameters(model, show_only_trainable=False)
Number of parameters: 20
Number of tensors: 2
>>> num_parameters(model, show_only_trainable=True)
Number of parameters: 8
Number of tensors: 1

Bear in mind that using my num_parameters() function takes longer to run than the other solutions since it has to loop over all the entries in all the tensors – about 2 minutes on my Mac’s CPU for a 22 million parameter model. It can be made much faster if you utilize the fact that the data pointers for consecutive memory addresses differ by the same constant amount (e.g. 4 bytes if the tensor is torch.float32). But this requires taking the tensors’ dtype and stride into account which is probably overkill if you are willing to wait a few minutes on anything larger than 20 million parameters.

Answered By: efthimio

I assume you may be able to use model.num_parameters()

Answered By: Ignatius Ezeani
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.