How can I compare weights of different Keras models?


I’ve saved numbers of models in .h5 format. I want to compare their characteristics such as weight.
I don’t have any Idea how I can appropriately compare them specially in the form of tables and figures.
Thanks in advance.

Asked By: Codie



Weight-introspection is a fairly advanced endeavor, and requires model-specific treatment. Visualizing weights is a largely technical challenge, but what you do with that information’s a different matter – I’ll address largely the former, but touch upon the latter.

Update: I also recommend See RNN for weights, gradients, and activations visualization.

Visualizing weights: one approach is as follows:

  1. Retrieve weights of layer of interest. Ex: model.layers[1].get_weights()
  2. Understand weight roles and dimensionality. Ex: LSTMs have three sets of weights: kernel, recurrent, and bias, each serving a different purpose. Within each weight matrix are gate weights – Input, Cell, Forget, Output. For Conv layers, the distinction’s between filters (dim0), kernels, and strides.
  3. Organize weight matrices for visualization in a meaningful manner per (2). Ex: for Conv, unlike for LSTM, feature-specific treatment isn’t really necessary, and we can simply flatten kernel weights and bias weights and visualize them in a histogram
  4. Select visualization method: histogram, heatmap, scatterplot, etc – for flattened data, a histogram is the best bet

Interpreting weights: a few approaches are:

  • Sparsity: if weight norm (“average”) is low, the model is sparse. May or may not be beneficial.
  • Health: if too many weights are zero or near-zero, it’s a sign of too many dead neurons; this can be useful for debugging, as once a layer’s in such a state, it usually does not revert – so training should be restarted
  • Stability: if weights are changing greatly and quickly, or if there are many high-valued weights, it may indicate impaired gradient performance, remedied by e.g. gradient clipping or weight constraints

Model comparison: there isn’t a way for simply looking at two weights from separate models side-by-side and deciding “this is the better one”; analyze each model separately, for example as above, then decide which one’s ups outweigh downs.

The ultimate tiebreaker, however, will be validation performance – and it’s also the more practical one. It goes as:

  1. Train model for several hyperparameter configurations
  2. Select one with best validation performance
  3. Fine-tune that model (e.g. via further hyperparameter configs)

Weight visualization should be mainly kept as a debugging or logging tool – as, put simply, even with our best current understanding of neural networks one cannot tell how well the model will generalize just by looking at the weights.

Suggestion: also visualize layer outputs – see this answer and sample output at bottom.

Visual example:

from tensorflow.keras.layers import Input, Conv2D, Dense, Flatten
from tensorflow.keras.models import Model

ipt = Input(shape=(16, 16, 16))
x   = Conv2D(12, 8, 1)(ipt)
x   = Flatten()(x)
out = Dense(16)(x)

model = Model(ipt, out)
model.compile('adam', 'mse')

X = np.random.randn(10, 16, 16, 16)  # toy data
Y = np.random.randn(10, 16)  # toy labels
for _ in range(10):
    model.train_on_batch(X, Y)

def get_weights_print_stats(layer):
    W = layer.get_weights()
    for w in W:
    return W

def hist_weights(weights, bins=500):
    for weight in weights:
        plt.hist(np.ndarray.flatten(weight), bins=bins)

W = get_weights_print_stats(model.layers[1])
# 2
# (8, 8, 16, 12)
# (12,)


enter image description here

Conv1D outputs visualization: (source)

Answered By: OverLordGoldDragon

To compare weights of two models, I vectorize model weights (i.e., create 1D array) for each model. Then, I calculate the percent difference between respective weights and construct a histogram of these percent differences. If all values are close to zero, there is suggestion (but not proof) that the models are practically the same. This is just one approach out of many for comparing models using their weights.

As an aside, I will note that I use this method when I want some indication that my model has converged on a global, rather than local, minimum. I will train models with several different initializations. If all the initializations result in convergence to the same weights, it suggests that the minimum is a global minimum.

Answered By: Snehal Patel
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.