Pytorch identifying batch size as number of channels in Conv2d layer

Question:

I am a total newbie to neural networks using Pytorch to create a VAE model. I’ve used a bit of tensorflow before, but I have no idea what "in_channels" and "out_channels" are, as arguments to nn.Conv2d/nn.Conv1d.

Disclaimers aside, currently, my model takes in a dataloader with batch size 128 and where each input is a 248 by 46 tensor (so, a 128 x 248 x 46 tensor).

My encoder looks like this right now — I chopped it down so I could focus on where the error was coming from.

class Encoder(nn.Module):
    def __init__(self, latent_dim):
        super(Encoder, self).__init__()
        self.latent_dim = latent_dim
        self.conv1 = nn.Conv2d(in_channels=248, out_channels=46, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))

    def forward(self, x):
        print(x.size())
        x = F.relu(self.conv1(x))
        return x

The Conv2d layer was meant to reduce the 248 by 46 input into a 50 by 46 tensor. However, I get this error:

RuntimeError: Given groups=1, weight of size [46, 248, 9, 9], expected input[1, 128, 248, 46] to have 248 channels, but got 128 channels instead

…even though I print x.size() and it displays as [torch.Size([128, 248, 46]).

I am unsure a) why the error shows that the layer is adding on an extra dimension to x, and b) whether I am even understanding channels correctly. Should 46 be the real number of channels? Why doesn’t Pytorch simply request my input size as a tuple or something, like in=(248, 46)?
Or c) if this is an issue with the way I loaded in my data to the model. I have a numpy array data of shape (-1, 248, 46) and then started training my model as follows.

tensor_data = torch.from_numpy(data)
dataset = TensorDataset(tensor_data, tensor_data)
train_dl = DataLoader(dataset, batch_size=128, shuffle=True)
...
for epoch in range(20):
     for x_train, y_train in train_loader:
          x_train = x_train.to(device).float()
          optimizer.zero_grad()
          x_pred, mu, log_var = vae(x_train)
          bce_loss = train.BCE(y_train, x_pred)
          kl_loss = train.KL(mu, log_var)
          loss = bce_loss + kl_loss
          loss.backward()
          optimizer.step()

Any thoughts appreciated!

Asked By: Racecar

||

Answers:

Let’s say your model takes a single channel image 28*28 this becomes 784 which is your in_channel and out_channels is the number of classes your model wants to predict

In pytorch, nn.Conv2d assumes the input (mostly image data) is shaped like: [B, C_in, H, W], where B is the batch size, C_in is the number of channels, H and W are the height and width of the image. The output has a similar shape [B, C_out, H_out, W_out]. Here, C_in and C_out are in_channels and out_channels, respectively. (H_out, W_out) is the output image size, which may or may not equal (H, W), depending on the kernel size, the stride and the padding.

However, it is confusing to apply conv2d to reduce [128, 248, 46] inputs to [128, 50, 46]. Are they image data with height 248 and width 46? If so you can reshape the inputs to [128, 1, 248, 46] and use in_channels = 1 and out_channels = 1 in conv2d.

Answered By: ihdv

You need to add an extra dimension for the number of channels (1) with the view function. The below code will work!

class Encoder(nn.Module):
    def __init__(self):
        super(Encoder, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))

    def forward(self, x):
        print("encoder input size: "+ str(x.shape))
        # x.shape[0] is the number of samples in batches if the number of samples >1, otherwise it is the width
        # (number of samples in a batch, number of channels, width, height)
        x = x.view(x.shape[0], 1, 248,46)
        print("encoder input size after adding 1 channel to shape: "+ str(x.shape))
        x = F.relu(self.conv1(x))
        return x

# a test dataset with 128 samples, 248 width and 46 height
test_dataset = torch.rand(128,248,46)
# prints shape of dataset
test.shape

model = Encoder()
model(test_dataset)

# if you are passing only one sample to the model (i.e. to plot) you need to do this instead
test_dataset2 = torch.rand(1,248,46)
model(test_dataset2.view(test_dataset2.shape[0],1,248,46))
Answered By: Isaac Zhao