Pytorch different outputs between with transpose

Question:

Let I have a tensor dimension of (B, N^2, C)
and I reshape it into (B, C, N, N).

I think that I have two choices below

A = torch.rand(5, 100, 20) # Original Tensor

# First Method
B = torch.transpose(2, 1)
B = B.view(5, 20, 10, 10)

# Second Method
C = A.view(5, 20, 10, 10)

Both methods work but the outputs are slightly different and I cannot catch the difference between them.

Thanks

Asked By: Suho Cho

||

Answers:

The difference between B and C is that you have used torch.transpose which means you have swapped two axes, this means you have changed the layout of the memory. The view at the end is just a nice interface for you to access your data but it has no effect on the underlying data of your tensor. What it comes down to is a contiguous memory data buffer.

If you take a smaller example, something we can grasp more easily:

>>> A = torch.rand(1, 4, 3)
tensor([[[0.2656, 0.5920, 0.3774],
         [0.8447, 0.5984, 0.0614],
         [0.5160, 0.8048, 0.6260],
         [0.1644, 0.3144, 0.1040]]])

Here swapping axis=1 and axis=2 comes down to a batched transpose (in mathematical terms):

>>> B = A.transpose(2, 1)
tensor([[[0.4543, 0.7447, 0.7814, 0.3444],
         [0.9766, 0.2732, 0.4766, 0.0387],
         [0.0123, 0.7260, 0.8939, 0.8581]]])

In terms of memory layout A has the following memory arangement:

>>> A.flatten()
tensor([0.4543, 0.9766, 0.0123, 0.7447, 0.2732, 0.7260, 0.7814, 0.4766, 0.8939,
        0.3444, 0.0387, 0.8581])

While B has a different layout. By layout I mean memory arrangement, I am not referring to its shape which is irrelevant:

>>> B.flatten()
tensor([0.4543, 0.7447, 0.7814, 0.3444, 0.9766, 0.2732, 0.4766, 0.0387, 0.0123,
        0.7260, 0.8939, 0.8581])

As I said reshaping i.e. building a view on top of a tensor doesn’t change its memory layout, it’s an abstraction level to better manipulate tensors.

So in the end, yes you end up with two different results: C shares the same data as A, while B is a copy and has a different memory layout.

Answered By: Ivan

Transposing/permuting and view/reshape are NOT the same!
reshape and view only affect the shape of a tensor, but d not change the underlying order of elements.
In contrast, transpose and permute change the underlying order of elements in the tensor. See this answer, and this one for more details.

Here’s an example, with B=1, N=3 and C=2, the first channel has even numbers 0..16, and the second channel has odd numbers 1..17:

A = torch.arange(2*9).view(1,9,2)
tensor([[[ 0,  1],
         [ 2,  3],
         [ 4,  5],
         [ 6,  7],
         [ 8,  9],
         [10, 11],
         [12, 13],
         [14, 15],
         [16, 17]]])

If you correctly transpose and then reshape, you get the correct split into even and odd channels:

A.transpose(1,2).view(1,2,3,3)
tensor([[[[ 0,  2,  4],
          [ 6,  8, 10],
          [12, 14, 16]],

         [[ 1,  3,  5],
          [ 7,  9, 11],
          [13, 15, 17]]]])

However, if you only change the shape (i.e., using view or reshape) you incorrectly "mix" the values from the two channels:

A.view(1,2,3,3)
tensor([[[[ 0,  1,  2],
          [ 3,  4,  5],
          [ 6,  7,  8]],

         [[ 9, 10, 11],
          [12, 13, 14],
          [15, 16, 17]]]])


Update (Aug 31st, 2022)
Take a look at this simple example:

# original tensor
x = torch.arange(12).view(3,4)
x.data_ptr()  # -> 94308398597888
x.stride()    # -> (4, 1)

# transpose
x1 = x.transpose(0, 1)
x1.data_ptr()  # -> 94308398597888  (same data)
x1.stride()    # -> (1, 4)  efficient stride representation can handle this

# messing around a bit more:
x1.view(3,4)

# strides cannot cut it anymore - we get an error
RuntimeError: view size is not compatible with input tensor''s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

# using reshape:
x2 = x1.reshape(3, 4)
x2.data_ptr()  # -> 94308399099200 (NOT the same data)
x2.stride()    # -> (4, 1)
Answered By: Shai
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.