How to improve pytroch model?

Question

Good evening, I have 4 classes with black and white images each class has 3000 images with a test of 600 images so how to improve this model and that’s my full code:

data_transform = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),
    transforms.Resize(size=(150, 150)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5]),
])

Loading Images using ImageFolder

train_data = datasets.ImageFolder(root=train_dir,
                                  transform=data_transform, # Transform the data
                                  target_transform=None) # Transform the Label

test_data = datasets.ImageFolder(root=test_dir,
                                  transform=data_transform, # Transform the data
                                  target_transform=None) # Transform the Label
train_data, test_data

Turn the data to DataLoader

BATCH_SIZE = 8

train_dataloader = DataLoader(
    dataset= train_data,
    batch_size=BATCH_SIZE, # How many images our model can see at the time
    num_workers=8,  # Number of CPU Cores
    shuffle=True
)

test_dataloader = DataLoader(
    dataset= test_data,
    batch_size=BATCH_SIZE, # How many images our model can see at the time
    num_workers=8,  # Number of CPU Cores
    shuffle=False
)

Create class

class TingVGG(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
        super().__init__()
        self.conv_block1 = nn.Sequential(
            
        nn.Conv2d(in_channels=input_shape,out_channels=hidden_units,kernel_size=3,stride=1,padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),
        nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3,stride=1,padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),
        nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3,stride=1,padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2)
        
        )
               
        self.dropout = nn.Dropout(0.4)
        self.classifier = nn.Sequential(nn.Flatten(), nn.Linear(in_features=hidden_units*18*18 ,out_features=output_shape))
        
        
    def forward(self, x: torch.Tensor):
        x = self.conv_block1(x)
        x = self.dropout(x)
        x = self.classifier(x)
        return x

Train the Model

# Set random seed
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Set number of epoches
NUM_EPOCHS = 10

# Create and initialize of TinyVGG
model_0 = TingVGG(input_shape=1, # Number of channels in the input image (c, h, w) -> 3
                  hidden_units=128,
                  output_shape=len(train_data.classes)).to(device)

# Setup the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params= model_0.parameters(),
                             lr= 0.001)

# Start the timer
start_time = time.time()

# Train model 0
model_0_results = train(model= model_0,
                        train_dataloader= train_dataloader,
                        test_dataloader= test_dataloader,
                        optimizer= optimizer,
                        loss_fn= loss_fn,
                        epochs= NUM_EPOCHS
                        )


 10%|█         | 1/10 [02:30<22:37, 150.88s/it]
Epoch: 1 | train_loss: 0.3549 | train_acc: 0.8668 | test_loss: 0.3059 | test_acc: 0.8842
 20%|██        | 2/10 [04:57<19:48, 148.59s/it]
Epoch: 2 | train_loss: 0.1707 | train_acc: 0.9420 | test_loss: 0.2648 | test_acc: 0.9062
 30%|███       | 3/10 [07:24<17:14, 147.83s/it]
Epoch: 3 | train_loss: 0.1153 | train_acc: 0.9627 | test_loss: 0.2790 | test_acc: 0.8962
 40%|████      | 4/10 [09:52<14:46, 147.71s/it]
Epoch: 4 | train_loss: 0.0900 | train_acc: 0.9695 | test_loss: 0.2719 | test_acc: 0.8979
 50%|█████     | 5/10 [12:19<12:18, 147.65s/it]
Epoch: 5 | train_loss: 0.0760 | train_acc: 0.9758 | test_loss: 0.2927 | test_acc: 0.8950
 60%|██████    | 6/10 [14:47<09:50, 147.57s/it]
Epoch: 6 | train_loss: 0.0616 | train_acc: 0.9814 | test_loss: 0.3326 | test_acc: 0.8942
 70%|███████   | 7/10 [17:15<07:23, 147.76s/it]
Epoch: 7 | train_loss: 0.0488 | train_acc: 0.9838 | test_loss: 0.3086 | test_acc: 0.8946
 80%|████████  | 8/10 [19:42<04:55, 147.60s/it]
Epoch: 8 | train_loss: 0.0534 | train_acc: 0.9835 | test_loss: 0.3186 | test_acc: 0.9017
 90%|█████████ | 9/10 [22:10<02:27, 147.66s/it]
Epoch: 9 | train_loss: 0.0422 | train_acc: 0.9878 | test_loss: 0.3317 | test_acc: 0.9012
100%|██████████| 10/10 [24:38<00:00, 147.80s/it]
Epoch: 10 | train_loss: 0.0433 | train_acc: 0.9878 | test_loss: 0.3853 | test_acc: 0.9038

The train and test loss and accuracy

Confusion matrix

# Import tqdm for progress bar
from tqdm.auto import tqdm

# 1. Make predictions with trained model
y_preds = []
model_0.eval()
with torch.inference_mode():
  for X, y in tqdm(test_dataloader, desc="Making predictions"):
    # Send data and targets to target device
    X, y = X.to(device), y.to(device)
    # Do the forward pass
    y_logit = model_0(X)
    # Turn predictions from logits -> prediction probabilities -> predictions labels
    y_pred = torch.softmax(y_logit, dim=1).argmax(dim=1)
    # Put predictions on CPU for evaluation
    y_preds.append(y_pred.cpu())
# Concatenate list of predictions into a tensor
y_pred_tensor = torch.cat(y_preds)




from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix

# 2. Setup confusion matrix instance and compare predictions to targets
confmat = ConfusionMatrix(num_classes=len(class_names), task='multiclass')
confmat_tensor = confmat(preds=y_pred_tensor,
                         target=torch.Tensor(test_data.targets))
# 3. Plot the confusion matrix
fig, ax = plot_confusion_matrix(
    conf_mat=confmat_tensor.numpy(), # matplotlib likes working with NumPy 
    class_names=class_names, # turn the row and column labels into class names
    figsize=(10, 7)
);

Confusion Result

so what should I do?

Asked By: Emad Younan

||

Source

Answer 1

From the loss plot it is clear that your model overfits training data after few epochs. This means that instead of learning general features from the input images it is specializing on the ones used for training.
What you can do is trying to reduce the size of the model to help it to learn more general features. At the same time I suggest you trying to increase you dataset size (12000 training samples are a bit few). You might try using some dataset augmentation technique such as elastic transformation. You will find more information on the data augmentation available on Pytorch here: here

Answered By: Niccolò Borgioli

Answer 2

As @Niccolò Borgioli mentioned, your model is overfitting the training data, so a very simple change that will most likely improve your code is adding weight decay, that modifies the loss function to include a term proportional to the norm of the weights, it makes the weighs tend to zero and reduces overfitting by avoiding weird situations where a lot of non-zero weights just counterbalance each other.

In practice you can very simply use:

DECAY_STRENGTH = 10**(-3)
optimizer = torch.optim.Adam(params= model_0.parameters(),
                             lr= 0.001, weight_decay=DECAY_STRENGTH)

If the model still overfits you can increase DECAY_STRENGTH, if it underfits (bad performance even in training) you can decrease it.

Always keep in mind that the best way to avoid over-fitting is having a larger training set, so if at all possible (if you have control on the data collection), try increasing the training data. Otherwise you can also try data augmentation as suggested by @Niccolò Borgioli

Answered By: Caridorc