FashionMNIST Dataset not transforming to Tensor

Question:

Trying to calculate the mean and standard deviation of the dataset to normalise it afterwards.

Current Code:

train_dataset = datasets.FashionMNIST('data', train=True, download = True, transform=[transforms.ToTensor()])
test_dataset = datasets.FashionMNIST('data', train=False, download = True, transform=[transforms.ToTensor()])

def calc_torch_mean_std(tens):   
    mean = torch.mean(tens, dim=1)
    std = torch.sqrt(torch.mean((tens - mean[:, None]) ** 2, dim=1))
    return(std, mean)

train_mean, train_std = calc_torch_mean_std(train_dataset)

test_mean, test_std = calc_torch_mean_std(test_dataset)

However, i’m getting the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/16/crymx03s6pzfspm_3qfrlkx00000gn/T/ipykernel_72423/605045038.py in <module>
      8     return(std, mean)
      9 
---> 10 train_mean, train_std = calc_torch_mean_std(train_dataset)
     11 
     12 test_mean, test_std = calc_torch_mean_std(test_dataset)

/var/folders/16/crymx03s6pzfspm_3qfrlkx00000gn/T/ipykernel_72423/605045038.py in calc_torch_mean_std(tens)
      4 
      5 def calc_torch_mean_std(tens):
----> 6     mean = torch.mean(tens, dim=1)
      7     std = torch.sqrt(torch.mean((tens - mean[:, None]) ** 2, dim=1))
      8     return(std, mean)

TypeError: mean() received an invalid combination of arguments - got (FashionMNIST, dim=int), but expected one of:
 * (Tensor input, *, torch.dtype dtype)
 * (Tensor input, tuple of ints dim, bool keepdim, *, torch.dtype dtype, Tensor out)
 * (Tensor input, tuple of names dim, bool keepdim, *, torch.dtype dtype, Tensor out)

It should be getting a tensor as i transform the data as it comes in using transforms.ToTensor().

Checked import of transforms and it is okay. Checked parameters for the datasets.FashionMNIST() and transform is correctly used (should work both with and without [ ]).

Expecting no error, and to get the mean and std for both datasets.

Asked By: HBridges

||

Answers:

datasets.FashionMNIST returns (image, target) where target is index of the target class. So if you want to take the mean you need to extract just the image.

images = torch.vstack([pair[0] for pair in train_dataset])

images should now be of shape (N, H, W) and you can do whatever you want from there.

Another solution as noted by OP is to use train_dataset.data to directly access the data.

Answered By: Chrispresso