Why are the MNIST images 1x28x28 tensors?

Question:

I made the MNIST images which are 28×28 pixel images into tensors with

dataset = MNIST(root='data/', train=True, transform=transforms.ToTensor())

and when I run

img_tensor, label = dataset[0]
print(img_tensor.shape, label)

It says the shape is torch.Size([1, 28, 28]).
Why is it a 1x28x28? What does the first dimension mean? and what is the point of a 1x28x28 opposed to 28×28?

Asked By: Spageti Man

||

Answers:

An image seen as a matrix has always 3 dimensions: channels, width and height. 28 and 28 are width and height of course. The 1 in this case is the channel. So what’s the channel? Every pixel is represented by three colors: red, blue and green. For each color, you will have one color-channel, so normally 3 (RGB). This makes a pictures dimension (3, W, H). So why do you have a 1 there? Because the MNIST images are black and white and therefore dont need three different color-channel to represent the final color, one channel is enough, therefore for black and white images you dimension is (1, W, H).
Here is a picture below to visualize the dimensions:
enter image description here

source: https://commons.wikimedia.org/wiki/File:RGB_channels_separation.png

So you see, for black and white images you only need one channel.
Normally you could ignore the 1 dimension, but pytorch demands the channel dimension.

Answered By: Theodor Peifer

The order is (B, C, W, H) -> (batch, channel, width and height) is which pytorch convolutions operate.

Answered By: Nivesh Gadipudi

The first dimension tracks color channels. The second and third dimensions represent pixels along the height and width of the image, respectively. Since images in the MNIST dataset are grayscale, there’s just one channel. Other datasets have images with color, in which case there are three channels: red, green, and blue (RGB).

Answered By: sherax_ 139