Training VGG16 from scratch doesn't improve accuracy in Keras

Question:

I’m trying to train VGG16 models using both transfer learning and training from scratch. I have a dataset with 7k images per category, and 4 different categories. I managed to come up with the transfer learning code no problem, however, the same program but for training from scratch does not seem to be working.

creating the model for transfer learning:

base_model = apps.VGG16(
    include_top=False,  # This is if we want the final FC layers
    weights="imagenet",
    input_shape=input_shape,
    classifier_activation="softmax",
    pooling = pooling,
)

# Freeze the base model
for layer in base_model.layers:
    layer.trainable = False

# convert output of base model to a 1D vector
x = Flatten()(base_model.output)

# We create fc_count fully connected layers, relu for all but the last
x = Dense(units=4096, activation='relu')(x) # relu avoids vanishing gradient problem
x = Dense(units=4096, activation='relu')(x) # relu avoids vanishing gradient problem 

# The final layer is a softmax layer
prediction = Dense(4, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=prediction)
model.compile(loss='categorical_crossentropy',
            optimizer=optimizers.Adam(learning_rate=0.001),
            metrics=['accuracy'])

Meanwhile, for training from scratch:

model = apps.VGG16(
    include_top=True,  # This is if we want the final FC layers
    weights=None,
    input_shape=input_shape,
    classifier_activation="softmax",
    pooling = pooling,
    classes = 4 # set the number of outputs to required count
)

model.compile(loss='categorical_crossentropy',
            optimizer=optimizers.Adam(learning_rate=0.1), # I've experimented w values as low as 0.001
            metrics=['accuracy'])
model.summary()

and the training is done via

history = model.fit(train_images,
                        validation_data=val_images,
                        epochs=epochs,
                        verbose=1, callbacks=callbacks)

Transfer learning takes around 10 epochs to converge, whereas I’ve gone up to 20 epochs when training from scratch, converging to an accuracy and val_accuracy of exactly 0.2637. I have a ReduceLROnPlateau that does make a difference when transfer learning.

I’m training on a NVIDIA GeForce RTX 3060 Laptop GPU.

EDIT: I should mention that I am getting loss of nan when training from scratch

Answers:

Problem got resolved by switching to the SGD optimizer