How to find features and targets for subset of data?

Question:

I’m trying to find the specific amount of features in a subset of a dataset. Below is an example code:

# import the required modules
import torch
import torchvision
from torchvision.datasets import CIFAR10
from collections import Counter

trainset = CIFAR10(root='./data', train=True, download=True, transform=torchvision.transforms.ToTensor())

indices = torch.arange(3000)
new_trainset_split = torch.utils.data.Subset(trainset, indices)

This outputs a 3000 datapoints, which is exactly what I want. However when I try the next line of code to see, how many types of each features there are (in this case, how many of the datapoints are 1’s, 2’s, 3’s etc..), it gives me the error:

print(dict(Counter(new_trainset_split.targets)))

AttributeError: ‘Subset’ object has no attribute ‘targets’

How do I find features and targets for subset of data?

Asked By: zampoan

||

Answers:

The reason why you are not getting access to targets is because data.Subset simply doesn’t implement the attributes as the wrapped data.Dataset (in your case datasets.CIFAR10) implements.

However, an easy workaround is to simply filter your initial dataset’s targets with indices directly:

>>> Counter(trainset.targets[i] for i in indices) 
Counter({0: 299,
         1: 287,
         2: 322,
         3: 285,
         4: 311,
         5: 279,
         6: 312,
         7: 297,
         8: 308,
         9: 300})
Answered By: Ivan

The subset class inherits from the dataset class.
Thus, you just need to call the dataset class in a way more simple way:

print(dict(Counter(new_trainset_split.dataset.targets)))

the output should be:

{6: 5000, 9: 5000, 4: 5000, 1: 5000, 2: 5000, 7: 5000, 8: 5000, 3: 5000, 5: 5000, 0: 5000}
Answered By: jvel07
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.