How to improve pytroch model?
Question:
Good evening, I have 4 classes with black and white images each class has 3000 images with a test of 600 images so how to improve this model and that’s my full code:
data_transform = transforms.Compose([
transforms.Grayscale(num_output_channels=1),
transforms.Resize(size=(150, 150)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5], std=[0.5]),
])
Loading Images using ImageFolder
train_data = datasets.ImageFolder(root=train_dir,
transform=data_transform, # Transform the data
target_transform=None) # Transform the Label
test_data = datasets.ImageFolder(root=test_dir,
transform=data_transform, # Transform the data
target_transform=None) # Transform the Label
train_data, test_data
Turn the data to DataLoader
BATCH_SIZE = 8
train_dataloader = DataLoader(
dataset= train_data,
batch_size=BATCH_SIZE, # How many images our model can see at the time
num_workers=8, # Number of CPU Cores
shuffle=True
)
test_dataloader = DataLoader(
dataset= test_data,
batch_size=BATCH_SIZE, # How many images our model can see at the time
num_workers=8, # Number of CPU Cores
shuffle=False
)
Create class
class TingVGG(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
super().__init__()
self.conv_block1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape,out_channels=hidden_units,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.dropout = nn.Dropout(0.4)
self.classifier = nn.Sequential(nn.Flatten(), nn.Linear(in_features=hidden_units*18*18 ,out_features=output_shape))
def forward(self, x: torch.Tensor):
x = self.conv_block1(x)
x = self.dropout(x)
x = self.classifier(x)
return x
Train the Model
# Set random seed
torch.manual_seed(42)
torch.cuda.manual_seed(42)
# Set number of epoches
NUM_EPOCHS = 10
# Create and initialize of TinyVGG
model_0 = TingVGG(input_shape=1, # Number of channels in the input image (c, h, w) -> 3
hidden_units=128,
output_shape=len(train_data.classes)).to(device)
# Setup the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params= model_0.parameters(),
lr= 0.001)
# Start the timer
start_time = time.time()
# Train model 0
model_0_results = train(model= model_0,
train_dataloader= train_dataloader,
test_dataloader= test_dataloader,
optimizer= optimizer,
loss_fn= loss_fn,
epochs= NUM_EPOCHS
)
10%|█ | 1/10 [02:30<22:37, 150.88s/it]
Epoch: 1 | train_loss: 0.3549 | train_acc: 0.8668 | test_loss: 0.3059 | test_acc: 0.8842
20%|██ | 2/10 [04:57<19:48, 148.59s/it]
Epoch: 2 | train_loss: 0.1707 | train_acc: 0.9420 | test_loss: 0.2648 | test_acc: 0.9062
30%|███ | 3/10 [07:24<17:14, 147.83s/it]
Epoch: 3 | train_loss: 0.1153 | train_acc: 0.9627 | test_loss: 0.2790 | test_acc: 0.8962
40%|████ | 4/10 [09:52<14:46, 147.71s/it]
Epoch: 4 | train_loss: 0.0900 | train_acc: 0.9695 | test_loss: 0.2719 | test_acc: 0.8979
50%|█████ | 5/10 [12:19<12:18, 147.65s/it]
Epoch: 5 | train_loss: 0.0760 | train_acc: 0.9758 | test_loss: 0.2927 | test_acc: 0.8950
60%|██████ | 6/10 [14:47<09:50, 147.57s/it]
Epoch: 6 | train_loss: 0.0616 | train_acc: 0.9814 | test_loss: 0.3326 | test_acc: 0.8942
70%|███████ | 7/10 [17:15<07:23, 147.76s/it]
Epoch: 7 | train_loss: 0.0488 | train_acc: 0.9838 | test_loss: 0.3086 | test_acc: 0.8946
80%|████████ | 8/10 [19:42<04:55, 147.60s/it]
Epoch: 8 | train_loss: 0.0534 | train_acc: 0.9835 | test_loss: 0.3186 | test_acc: 0.9017
90%|█████████ | 9/10 [22:10<02:27, 147.66s/it]
Epoch: 9 | train_loss: 0.0422 | train_acc: 0.9878 | test_loss: 0.3317 | test_acc: 0.9012
100%|██████████| 10/10 [24:38<00:00, 147.80s/it]
Epoch: 10 | train_loss: 0.0433 | train_acc: 0.9878 | test_loss: 0.3853 | test_acc: 0.9038
The train and test loss and accuracy
Confusion matrix
# Import tqdm for progress bar
from tqdm.auto import tqdm
# 1. Make predictions with trained model
y_preds = []
model_0.eval()
with torch.inference_mode():
for X, y in tqdm(test_dataloader, desc="Making predictions"):
# Send data and targets to target device
X, y = X.to(device), y.to(device)
# Do the forward pass
y_logit = model_0(X)
# Turn predictions from logits -> prediction probabilities -> predictions labels
y_pred = torch.softmax(y_logit, dim=1).argmax(dim=1)
# Put predictions on CPU for evaluation
y_preds.append(y_pred.cpu())
# Concatenate list of predictions into a tensor
y_pred_tensor = torch.cat(y_preds)
from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix
# 2. Setup confusion matrix instance and compare predictions to targets
confmat = ConfusionMatrix(num_classes=len(class_names), task='multiclass')
confmat_tensor = confmat(preds=y_pred_tensor,
target=torch.Tensor(test_data.targets))
# 3. Plot the confusion matrix
fig, ax = plot_confusion_matrix(
conf_mat=confmat_tensor.numpy(), # matplotlib likes working with NumPy
class_names=class_names, # turn the row and column labels into class names
figsize=(10, 7)
);
Confusion Result
so what should I do?
Answers:
From the loss plot it is clear that your model overfits training data after few epochs. This means that instead of learning general features from the input images it is specializing on the ones used for training.
What you can do is trying to reduce the size of the model to help it to learn more general features. At the same time I suggest you trying to increase you dataset size (12000 training samples are a bit few). You might try using some dataset augmentation technique such as elastic transformation. You will find more information on the data augmentation available on Pytorch here: here
As @Niccolò Borgioli mentioned, your model is overfitting the training data, so a very simple change that will most likely improve your code is adding weight decay, that modifies the loss function to include a term proportional to the norm of the weights, it makes the weighs tend to zero and reduces overfitting by avoiding weird situations where a lot of non-zero weights just counterbalance each other.
In practice you can very simply use:
DECAY_STRENGTH = 10**(-3)
optimizer = torch.optim.Adam(params= model_0.parameters(),
lr= 0.001, weight_decay=DECAY_STRENGTH)
If the model still overfits you can increase DECAY_STRENGTH
, if it underfits (bad performance even in training) you can decrease it.
Always keep in mind that the best way to avoid over-fitting is having a larger training set, so if at all possible (if you have control on the data collection), try increasing the training data. Otherwise you can also try data augmentation as suggested by @Niccolò Borgioli
Good evening, I have 4 classes with black and white images each class has 3000 images with a test of 600 images so how to improve this model and that’s my full code:
data_transform = transforms.Compose([
transforms.Grayscale(num_output_channels=1),
transforms.Resize(size=(150, 150)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5], std=[0.5]),
])
Loading Images using ImageFolder
train_data = datasets.ImageFolder(root=train_dir,
transform=data_transform, # Transform the data
target_transform=None) # Transform the Label
test_data = datasets.ImageFolder(root=test_dir,
transform=data_transform, # Transform the data
target_transform=None) # Transform the Label
train_data, test_data
Turn the data to DataLoader
BATCH_SIZE = 8
train_dataloader = DataLoader(
dataset= train_data,
batch_size=BATCH_SIZE, # How many images our model can see at the time
num_workers=8, # Number of CPU Cores
shuffle=True
)
test_dataloader = DataLoader(
dataset= test_data,
batch_size=BATCH_SIZE, # How many images our model can see at the time
num_workers=8, # Number of CPU Cores
shuffle=False
)
Create class
class TingVGG(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int) -> None:
super().__init__()
self.conv_block1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape,out_channels=hidden_units,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.dropout = nn.Dropout(0.4)
self.classifier = nn.Sequential(nn.Flatten(), nn.Linear(in_features=hidden_units*18*18 ,out_features=output_shape))
def forward(self, x: torch.Tensor):
x = self.conv_block1(x)
x = self.dropout(x)
x = self.classifier(x)
return x
Train the Model
# Set random seed
torch.manual_seed(42)
torch.cuda.manual_seed(42)
# Set number of epoches
NUM_EPOCHS = 10
# Create and initialize of TinyVGG
model_0 = TingVGG(input_shape=1, # Number of channels in the input image (c, h, w) -> 3
hidden_units=128,
output_shape=len(train_data.classes)).to(device)
# Setup the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params= model_0.parameters(),
lr= 0.001)
# Start the timer
start_time = time.time()
# Train model 0
model_0_results = train(model= model_0,
train_dataloader= train_dataloader,
test_dataloader= test_dataloader,
optimizer= optimizer,
loss_fn= loss_fn,
epochs= NUM_EPOCHS
)
10%|█ | 1/10 [02:30<22:37, 150.88s/it]
Epoch: 1 | train_loss: 0.3549 | train_acc: 0.8668 | test_loss: 0.3059 | test_acc: 0.8842
20%|██ | 2/10 [04:57<19:48, 148.59s/it]
Epoch: 2 | train_loss: 0.1707 | train_acc: 0.9420 | test_loss: 0.2648 | test_acc: 0.9062
30%|███ | 3/10 [07:24<17:14, 147.83s/it]
Epoch: 3 | train_loss: 0.1153 | train_acc: 0.9627 | test_loss: 0.2790 | test_acc: 0.8962
40%|████ | 4/10 [09:52<14:46, 147.71s/it]
Epoch: 4 | train_loss: 0.0900 | train_acc: 0.9695 | test_loss: 0.2719 | test_acc: 0.8979
50%|█████ | 5/10 [12:19<12:18, 147.65s/it]
Epoch: 5 | train_loss: 0.0760 | train_acc: 0.9758 | test_loss: 0.2927 | test_acc: 0.8950
60%|██████ | 6/10 [14:47<09:50, 147.57s/it]
Epoch: 6 | train_loss: 0.0616 | train_acc: 0.9814 | test_loss: 0.3326 | test_acc: 0.8942
70%|███████ | 7/10 [17:15<07:23, 147.76s/it]
Epoch: 7 | train_loss: 0.0488 | train_acc: 0.9838 | test_loss: 0.3086 | test_acc: 0.8946
80%|████████ | 8/10 [19:42<04:55, 147.60s/it]
Epoch: 8 | train_loss: 0.0534 | train_acc: 0.9835 | test_loss: 0.3186 | test_acc: 0.9017
90%|█████████ | 9/10 [22:10<02:27, 147.66s/it]
Epoch: 9 | train_loss: 0.0422 | train_acc: 0.9878 | test_loss: 0.3317 | test_acc: 0.9012
100%|██████████| 10/10 [24:38<00:00, 147.80s/it]
Epoch: 10 | train_loss: 0.0433 | train_acc: 0.9878 | test_loss: 0.3853 | test_acc: 0.9038
The train and test loss and accuracy
Confusion matrix
# Import tqdm for progress bar
from tqdm.auto import tqdm
# 1. Make predictions with trained model
y_preds = []
model_0.eval()
with torch.inference_mode():
for X, y in tqdm(test_dataloader, desc="Making predictions"):
# Send data and targets to target device
X, y = X.to(device), y.to(device)
# Do the forward pass
y_logit = model_0(X)
# Turn predictions from logits -> prediction probabilities -> predictions labels
y_pred = torch.softmax(y_logit, dim=1).argmax(dim=1)
# Put predictions on CPU for evaluation
y_preds.append(y_pred.cpu())
# Concatenate list of predictions into a tensor
y_pred_tensor = torch.cat(y_preds)
from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix
# 2. Setup confusion matrix instance and compare predictions to targets
confmat = ConfusionMatrix(num_classes=len(class_names), task='multiclass')
confmat_tensor = confmat(preds=y_pred_tensor,
target=torch.Tensor(test_data.targets))
# 3. Plot the confusion matrix
fig, ax = plot_confusion_matrix(
conf_mat=confmat_tensor.numpy(), # matplotlib likes working with NumPy
class_names=class_names, # turn the row and column labels into class names
figsize=(10, 7)
);
Confusion Result
so what should I do?
From the loss plot it is clear that your model overfits training data after few epochs. This means that instead of learning general features from the input images it is specializing on the ones used for training.
What you can do is trying to reduce the size of the model to help it to learn more general features. At the same time I suggest you trying to increase you dataset size (12000 training samples are a bit few). You might try using some dataset augmentation technique such as elastic transformation. You will find more information on the data augmentation available on Pytorch here: here
As @Niccolò Borgioli mentioned, your model is overfitting the training data, so a very simple change that will most likely improve your code is adding weight decay, that modifies the loss function to include a term proportional to the norm of the weights, it makes the weighs tend to zero and reduces overfitting by avoiding weird situations where a lot of non-zero weights just counterbalance each other.
In practice you can very simply use:
DECAY_STRENGTH = 10**(-3)
optimizer = torch.optim.Adam(params= model_0.parameters(),
lr= 0.001, weight_decay=DECAY_STRENGTH)
If the model still overfits you can increase DECAY_STRENGTH
, if it underfits (bad performance even in training) you can decrease it.
Always keep in mind that the best way to avoid over-fitting is having a larger training set, so if at all possible (if you have control on the data collection), try increasing the training data. Otherwise you can also try data augmentation as suggested by @Niccolò Borgioli