What does the underscore suffix in PyTorch functions mean?
Question:
In PyTorch, many methods of a tensor exist in two versions – one with a underscore suffix, and one without. If I try them out, they seem to do the same thing:
In [1]: import torch
In [2]: a = torch.tensor([2, 4, 6])
In [3]: a.add(10)
Out[3]: tensor([12, 14, 16])
In [4]: a.add_(10)
Out[4]: tensor([12, 14, 16])
What is the difference between
torch.add
and torch.add_
torch.sub
and torch.sub_
- …and so on?
Answers:
According to the documentation, Methods which end in an underscore change the tensor in-place. That means that no new memory is being allocated by doing the operation, which in general increase performance, but can lead to problems and worse performance in PyTorch.
In [2]: a = torch.tensor([2, 4, 6])
tensor.add():
In [3]: b = a.add(10)
In [4]: a is b
Out[4]: False # b is a new tensor, new memory was allocated
tensor.add_():
In [3]: b = a.add_(10)
In [4]: a is b
Out[4]: True # Same object, no new memory was allocated
Notice, that the operators +
and +=
are also two different implementations. +
creates a new tensor by using .add()
, while +=
modifies the tensor by using .add_()
In [2]: a = torch.tensor([2, 4, 6])
In [3]: id(a)
Out[3]: 140250660654104
In [4]: a += 10
In [5]: id(a)
Out[5]: 140250660654104 # Still the same object, no memory allocation was required
In [6]: a = a + 10
In [7]: id(a)
Out[7]: 140250649668272 # New object was created
You have already answered your own question that the underscore indicates in-place operations in PyTorch. However I want to point out briefly why in-place operations can be problematic:
-
First of all on the PyTorch sites it is recommended to not use in-place operations in most cases. Unless working under heavy memory pressure it is more efficient in most cases to not use in-place operations.
https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd
-
Secondly there can be problems calculating the gradients when using in-place operations:
Every tensor keeps a version counter, that is incremented every time
it is marked dirty in any operation. When a Function saves any tensors
for backward, a version counter of their containing Tensor is saved as
well. Once you access self.saved_tensors
it is checked, and if it is
greater than the saved value an error is raised. This ensures that if
you’re using in-place functions and not seeing any errors, you can be
sure that the computed gradients are correct.
Same source as above.
Here is a shot and slightly modified example taken from the answer you’ve posted:
First the in-place version:
import torch
a = torch.tensor([2, 4, 6], requires_grad=True, dtype=torch.float)
adding_tensor = torch.rand(3)
b = a.add_(adding_tensor)
c = torch.sum(b)
c.backward()
print(c.grad_fn)
Which leads to this error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-27-c38b252ffe5f> in <module>
2 a = torch.tensor([2, 4, 6], requires_grad=True, dtype=torch.float)
3 adding_tensor = torch.rand(3)
----> 4 b = a.add_(adding_tensor)
5 c = torch.sum(b)
6 c.backward()
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
Secondly the non in-place version:
import torch
a = torch.tensor([2, 4, 6], requires_grad=True, dtype=torch.float)
adding_tensor = torch.rand(3)
b = a.add(adding_tensor)
c = torch.sum(b)
c.backward()
print(c.grad_fn)
Which works just fine – output:
<SumBackward0 object at 0x7f06b27a1da0>
So as a take-away I just wanted to point out to carefully use in-place operations in PyTorch.
In PyTorch, the ends with an underscore, a convention in PyTorch that indicates the method will not return a new tensor but will instead modify the tensor in place. For example, scatter_
.
https://yuyangyy.medium.com/understand-torch-scatter-b0fd6275331c
In PyTorch, many methods of a tensor exist in two versions – one with a underscore suffix, and one without. If I try them out, they seem to do the same thing:
In [1]: import torch
In [2]: a = torch.tensor([2, 4, 6])
In [3]: a.add(10)
Out[3]: tensor([12, 14, 16])
In [4]: a.add_(10)
Out[4]: tensor([12, 14, 16])
What is the difference between
torch.add
andtorch.add_
torch.sub
andtorch.sub_
- …and so on?
According to the documentation, Methods which end in an underscore change the tensor in-place. That means that no new memory is being allocated by doing the operation, which in general increase performance, but can lead to problems and worse performance in PyTorch.
In [2]: a = torch.tensor([2, 4, 6])
tensor.add():
In [3]: b = a.add(10)
In [4]: a is b
Out[4]: False # b is a new tensor, new memory was allocated
tensor.add_():
In [3]: b = a.add_(10)
In [4]: a is b
Out[4]: True # Same object, no new memory was allocated
Notice, that the operators +
and +=
are also two different implementations. +
creates a new tensor by using .add()
, while +=
modifies the tensor by using .add_()
In [2]: a = torch.tensor([2, 4, 6])
In [3]: id(a)
Out[3]: 140250660654104
In [4]: a += 10
In [5]: id(a)
Out[5]: 140250660654104 # Still the same object, no memory allocation was required
In [6]: a = a + 10
In [7]: id(a)
Out[7]: 140250649668272 # New object was created
You have already answered your own question that the underscore indicates in-place operations in PyTorch. However I want to point out briefly why in-place operations can be problematic:
-
First of all on the PyTorch sites it is recommended to not use in-place operations in most cases. Unless working under heavy memory pressure it is more efficient in most cases to not use in-place operations.
https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd -
Secondly there can be problems calculating the gradients when using in-place operations:
Every tensor keeps a version counter, that is incremented every time
it is marked dirty in any operation. When a Function saves any tensors
for backward, a version counter of their containing Tensor is saved as
well. Once you accessself.saved_tensors
it is checked, and if it is
greater than the saved value an error is raised. This ensures that if
you’re using in-place functions and not seeing any errors, you can be
sure that the computed gradients are correct.
Same source as above.
Here is a shot and slightly modified example taken from the answer you’ve posted:
First the in-place version:
import torch
a = torch.tensor([2, 4, 6], requires_grad=True, dtype=torch.float)
adding_tensor = torch.rand(3)
b = a.add_(adding_tensor)
c = torch.sum(b)
c.backward()
print(c.grad_fn)
Which leads to this error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-27-c38b252ffe5f> in <module>
2 a = torch.tensor([2, 4, 6], requires_grad=True, dtype=torch.float)
3 adding_tensor = torch.rand(3)
----> 4 b = a.add_(adding_tensor)
5 c = torch.sum(b)
6 c.backward()
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
Secondly the non in-place version:
import torch
a = torch.tensor([2, 4, 6], requires_grad=True, dtype=torch.float)
adding_tensor = torch.rand(3)
b = a.add(adding_tensor)
c = torch.sum(b)
c.backward()
print(c.grad_fn)
Which works just fine – output:
<SumBackward0 object at 0x7f06b27a1da0>
So as a take-away I just wanted to point out to carefully use in-place operations in PyTorch.
In PyTorch, the ends with an underscore, a convention in PyTorch that indicates the method will not return a new tensor but will instead modify the tensor in place. For example, scatter_
.
https://yuyangyy.medium.com/understand-torch-scatter-b0fd6275331c