Do Architectures built using tf.keras.Models.Sequential run more slowly and accurately than those using Tensorflow's Functional API

Question

I just compared 2 (I thought) equivalent VGG-ish architectures. One was constructed using tf.keras.Models.Sequential, the other used Tensorflow’s functional API. Each was attempting to solve the cats_vs_dogs dataset.

After 10 training epochs, the Sequential model had these runtimes and accuracies:

Epoch 10/10
703/703 [==============] - 16s 23ms/step - accuracy: 0.9271 - val_accuracy: 0.8488

But the Functional API output had these runtimes and accuracies:

Epoch 10/10
703/703 [==============] - 15s 22ms/step - accuracy: 0.8483 - val_accuracy: 0.8072

The differences in training times and accuracy struck me as severe. The training time differences were less severe, but consistent. Now I’m wondering if my nets are truly equivalent, or if there’s some difference between Sequential and the Functional API that accounts for this.

I imported the following modules:

import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.optimizers import RMSprop

They used these versions of tf & tfds:

Tensorflow Version: 2.10.0 
Tensorflow_Datasets Version: 4.7.0+nightly (Note: This was what I got from Conda - 
                                            not a purposeful choice)

I downloaded the cats vs. dogs datasets directly from Kaggle, since there was some checksum error when I tried to download using the tf methods for downloading standard datasets. I ultimately had to remove some files that were deleted (!?) or were using CMYK color coding (!?), but there were fewer than 10 such images.

I constructed the datasets in this way:

builder = tfds.folder_dataset.ImageFolder('./cat_vs_dog/')
dataset = builder.as_dataset(split='train', shuffle_files=True)
d2 = builder.as_dataset(split='test', shuffle_files=True)

def preprocess(features):
   # Resize and normalize
   image = tf.image.resize(features['image'], (224, 224))
   return tf.cast(image, tf.float32) / 255., features['label']

# preprocess dataset
dataset = dataset.map(preprocess).batch(32)
d2 = d2.map(preprocess).batch(32)

The relevant factors from builder.info are:

tfds.core.DatasetInfo(
    features=FeaturesDict({
        'image': Image(shape=(None, None, 3), dtype=tf.uint8),
        'image/filename': Text(shape=(), dtype=tf.string),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=2500, num_shards=1>,
        'train': <SplitInfo num_examples=22495, num_shards=1>,
    },
)

I constructed the Sequential model like this:

def seq_model():

  model = tf.keras.models.Sequential([ 
      tf.keras.layers.Conv2D(32, (3, 3), activation = 'relu', 
                             input_shape = (224, 224, 3)),
      tf.keras.layers.MaxPooling2D(2, 2),
      tf.keras.layers.Conv2D(64, (3, 3), activation = 'relu'),
      tf.keras.layers.MaxPooling2D(2,2),
      tf.keras.layers.Conv2D(128, (3, 3), activation = 'relu'),
      tf.keras.layers.MaxPooling2D(2, 2),
      tf.keras.layers.Conv2D(128, (3, 3), activation = 'relu'),
      tf.keras.layers.MaxPooling2D(2, 2),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(512, activation = 'relu'),
      tf.keras.layers.Dense(1, activation = 'sigmoid'),
  ])

  model.compile(optimizer = RMSprop(learning_rate = 1e-4),
                loss = 'binary_crossentropy',
                metrics = ['accuracy']) 
    

  return model
  
model = seq_model()
history = model.fit(dataset, validation_data=d2, epochs=10)

I constructed the Functional API model like this:

class Mini_Block(tf.keras.Model):
    def __init__(self, filters, kernel_size, pool_size=2, strides=2):
        super().__init__()
        self.filters = filters
        self.kernel_size = kernel_size
            
        # Define a Conv2D layer, specifying filters, 
        # kernel_size, activation and padding.
        self.conv2D_0 = tf.keras.layers.Conv2D(filters=filters, 
                                                        kernel_size=kernel_size, 
                                                        activation='relu',
                                                        strides=strides,
                                                        padding='same')
        
        # Define the max pool layer that will be added after the Conv2D blocks
        self.max_pool = tf.keras.layers.MaxPooling2D(pool_size=pool_size, 
                                                     strides=strides,
                                                     padding='same')
  
    def call(self, inputs):
        # access the class's conv2D_0 layer
        conv2D_0 = self.conv2D_0
        
        # Connect the conv2D_0 layer to inputs
        x = conv2D_0(inputs)

        # Finally, add the max_pool layer
        max_pool = self.max_pool(x)
        
        return max_pool
    
class MiniVGG(tf.keras.Model):

    def __init__(self, num_classes):
        super().__init__()

        # Creating VGG blocks
        self.block_a = Mini_Block(filters=32, kernel_size=3)
        self.block_b = Mini_Block(filters=64, kernel_size=3)
        self.block_c = Mini_Block(filters=128, kernel_size=3)
        self.block_d = Mini_Block(filters=128, kernel_size=3)        

        # Classification Head
        self.flatten = tf.keras.layers.Flatten()
        self.fc = tf.keras.layers.Dense(512, activation='relu')
        self.classifier = tf.keras.layers.Dense(1, activation='sigmoid')
        
    def call(self, inputs):
        # Chain all the layers one after the other
        x = self.block_a(inputs)
        x = self.block_b(x)
        x = self.block_c(x)
        x = self.block_d(x)
        x = self.flatten(x)
        x = self.fc(x)
        x = self.classifier(x)
        return x
        
vgg = MiniVGG(num_classes=1)
vgg.compile(optimizer=RMSprop(learning_rate = 1e-4), 
                              loss='binary_crossentropy', 
                              metrics=['accuracy'])
hist = vgg.fit(dataset, validation_data=d2, epochs=10)

Is there some structural difference between these two nets, or a reason that Sequential nets are much more accurate and slightly slower than those using the Functional API?

These nets actually differ in that the Sequential model uses ‘valid’ padding and a stride of 1, but the Functional API model has ‘same’ padding and a stride of 2. Tragically, this is not the whole story.

Following @V.M’s suggestion, when I looked at the nets’ architectures directly, using this code:

def get_params(curr_layer, spaces=""):
    if hasattr(curr_layer,'layers'):
        print(spaces,curr_layer.name)
        for sub_layer in curr_layer.layers:
            get_params(sub_layer, spaces+"  ")
    elif hasattr(curr_layer,'weights'):
        print(spaces,curr_layer.name)
        for xx in curr_layer.weights:
            print(spaces+"  Weights Shape:",xx.shape)  
        if len(curr_layer.weights) < 1:
            print(spaces+"  ", "No Weights") 
        if "conv" in curr_layer.name:
            print(spaces + "  Padding:", curr_layer.padding)
            print(spaces + "  Strides:", curr_layer.strides)

I found some weirdness involving the flatten layer. But, I think that’s a topic for another question.

Asked By: user1245262

||

Source

Answer 1

The reason for the differences in these two nets lie in the differences in the way they are padded (the sequential model used valid padding, the functional API model used same padding) and in the strides they used. The sequential model used strides of 1 for the conv layer and 2 for the pooling layer, while the functional API model used a stride of 2 for each layer.

Another tricky (for me, anyway) aspect is that the conv layer has a default stride value of 1, while the pool layer has a default value of None, which is subsequently converted to match the pool size.

I used the following 2 methods (and helpful comments from @V.M ) to directly inspect my nets and figure this out (Note: These functions are specifically written for my nets and problem, but should be easily generalizable)

def get_params(curr_layer, spaces=""):
    '''Get internal net parameters for a single layer'''
    if hasattr(curr_layer,'layers'):
        print(spaces,curr_layer.name)
        for sub_layer in curr_layer.layers:
            # Recursive call, since 'Block' layer of functional API net was made up of sub-layers containing the parameters of interest
            get_params(sub_layer, spaces+"  ")
    elif hasattr(curr_layer,'weights'):
        print(spaces,curr_layer.name)
        for xx in curr_layer.weights:
            print(spaces + "  " +xx.name.split('/')[-1] +  " Shape:",xx.shape)  
        if hasattr(curr_layer,'padding'):
            print(spaces + "  Padding:", curr_layer.padding)
        if hasattr(curr_layer,'strides'):
            print(spaces + "  Strides:", curr_layer.strides)
        if hasattr(curr_layer,'pool_size'):
            print(spaces + "  Pool Size:", curr_layer.pool_size)

and

def feature_map_info(model, input_shape):
    ''' Get dimensions of feature maps at each layer'''
    for layer in model.layers:
        print(layer.name)
        try:
            for sub_layer in layer.layers:
                output_shape = sub_layer.compute_output_shape(input_shape)
                print("  ", sub_layer.name)
                print("     ", output_shape)
                input_shape = output_shape.as_list()
        except:
            output_shape = layer.compute_output_shape(input_shape)
            print("     ", output_shape)
            input_shape = output_shape.as_list()

Note: For feature_map_info, the input shape needs to be entered as a 4 element list – [batch size, rows, cols, depth], where batch_size can be set to None.

For completeness, here are the matching nets with the Functional API & the Sequential model:

Functional API:

class Mini_Block2(tf.keras.Model):
    def __init__(self, filters, kernel_size, pool_size=2, strides=1):
        super().__init__()
        self.filters = filters
        self.kernel_size = kernel_size
            
        # Define a Conv2D layer, specifying filters, kernel_size, activation and padding.
        self.conv2D_0 = tf.keras.layers.Conv2D(filters=filters, 
                                                        kernel_size=kernel_size, 
                                                        activation='relu',
                                                        strides=strides,
                                                        padding='valid')
        
        # Define the max pool layer that will be added after the Conv2D blocks
        self.max_pool = tf.keras.layers.MaxPooling2D(pool_size=pool_size, 
                                                     padding='valid')
  
    def call(self, inputs):
        # access the class's conv2D_0 layer
        conv2D_0 = self.conv2D_0
        
        # Connect the conv2D_0 layer to inputs
        x = conv2D_0(inputs)

        # Finally, add the max_pool layer
        max_pool = self.max_pool(x)
        
        return max_pool
    
class MiniVGG2(tf.keras.Model):

    def __init__(self, num_classes):
        super().__init__()

        # Creating blocks of VGG with the following 
        # (filters, kernel_size, repetitions) configurations
        self.block_a = Mini_Block2(filters=32, kernel_size=3)
        self.block_b = Mini_Block2(filters=64, kernel_size=3)
        self.block_c = Mini_Block2(filters=128, kernel_size=3)
        self.block_d = Mini_Block2(filters=128, kernel_size=3)        

        # Classification head
        # Define a Flatten layer
        self.flatten = tf.keras.layers.Flatten()
        # Create a Dense layer with 512 units and ReLU as the activation function
        self.fc = tf.keras.layers.Dense(512, activation='relu')
        # Finally add the softmax classifier using a Dense layer
        self.classifier = tf.keras.layers.Dense(1, activation='sigmoid')
    def call(self, inputs):
        # Chain all the layers one after the other
        x = self.block_a(inputs)
        x = self.block_b(x)
        x = self.block_c(x)
        x = self.block_d(x)
        x = self.flatten(x)
        x = self.fc(x)
        x = self.classifier(x)
        return x

Sequential

def seq_model():

  model = tf.keras.models.Sequential([ 
      tf.keras.layers.Conv2D(32, (3, 3), activation = 'relu', input_shape = (224, 224, 3)),
      tf.keras.layers.MaxPooling2D(2, 2),
      
      tf.keras.layers.Conv2D(64, (3, 3), activation = 'relu'),
      tf.keras.layers.MaxPooling2D(2,2),
      
      tf.keras.layers.Conv2D(128, (3, 3), activation = 'relu'),
      tf.keras.layers.MaxPooling2D(2, 2),
      
      tf.keras.layers.Conv2D(128, (3, 3), activation = 'relu'),
      tf.keras.layers.MaxPooling2D(2, 2),
      
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(512, activation = 'relu'),
      tf.keras.layers.Dense(1, activation = 'sigmoid'),
  ])

  model.compile(optimizer = RMSprop(learning_rate = 1e-4),
                loss = 'binary_crossentropy',
                metrics = ['accuracy']) 
    

  return model

Answered By: user1245262

Do Architectures built using tf.keras.Models.Sequential run more slowly and accurately than those using Tensorflow's Functional API

Question:

Answers: