In Tensorflow, how can I access my model's weights when computing loss?

Question:

I’m implementing a model described in a research paper (https://ieeexplore.ieee.org/document/9897541 for those with access to IEEEExplore, although not necessary to understand this question) which proposes a loss function.

This loss function is computed in part from the weights matrix from one of the layers in the model (a fully-connected layer which is a Keras Dense layer object in my code).

I’ve got a class subclassed from Keras.Model which has a custom implementation of train_step().

In the with tf.GradientTape() as tape: block, I calculate the loss, which involves calling M = self.fc_layer.get_weights()[0], which returns a NumPy array. As this datatype is not a Tensorflow datatype, it appears that it is unable to associate the loss generated on the gradient tape with the trainable weights, as I’m getting the following error:

ValueError: No gradients provided for any variable: (['conv1_conv/kernel:0', 'conv1_conv/bias:0', 'conv1_bn/gamma:0', 'conv1_bn/beta:0', 'conv2_block1_1_conv/kernel:0', 'conv2_block1_1_conv/bias:0', 'conv2_block1_1_bn/gamma:0', 'conv2_block1_1_bn/beta:0', 'conv2_block1_2_conv/kernel:0', 'conv2_block1_2_conv/bias:0', 'conv2_block1_2_bn/gamma:0', 'conv2_block1_2_bn/beta:0', 'conv2_block1_0_conv/kernel:0', 'conv2_block1_0_conv/bias:0', 'conv2_block1_3_conv/kernel:0', 'conv2_block1_3_conv/bias:0', 'conv2_block1_0_bn/gamma:0', 'conv2_block1_0_bn/beta:0', 'conv2_block1_3_bn/gamma:0', 'conv2_block1_3_bn/beta:0', 'conv2_block2_1_conv/kernel:0', 'conv2_block2_1_conv/bias:0', 'conv2_block2_1_bn/gamma:0', 'conv2_block2_1_bn/beta:0', 'conv2_block2_2_conv/kernel:0', 'conv2_block2_2_conv/bias:0', 
etc.

If I’m correct in thinking that this is the problem, how could I access the weights of self.fc_layer without breaking away from the requirements of the Tensorflow graph? I’ve got Eager Execution enabled just to get this far, which I’m assuming is a clear indicator that my current code is not graph-compatible for that reason.

Extra information if you’re interested:
The reason the weights are needed in the loss function is because one row of the weights matrix represents a class centre, from which cosine similarity between that and the centre is computed, to maximise inter-class difference in an embedding generated for a vehicle re-identification task.

Asked By: magmacollaris

||

Answers:

I would suggest building your own fully connected layer. A basic dense layer, taking into account the new class viewpoint $$M in mathbb{R}^{dtimes Ctimes V}$$, is:

import tensorflow as tf

class DenseCustom(tf.keras.layers.Layer):
    
    def __init__(self,C,V,activation=None,activity_regularizer=None,kernel_initializer="GlorotUniform",trainable=True,seed=None,**kwargs):
        # pylint: enable=g-doc-args
        super(DenseCustom, self).__init__(
                activity_regularizer=activity_regularizer,
                **kwargs)

        self.C=C
        self.V=V
        self.activation_fn=activation
        if(self.activation_fn is None):
            self.activation_fn="linear"
        self.kernel_initializer=kernel_initializer
        self.trainable=trainable
        self.seed = seed

    def build(self, input_shape):
        assert len(input_shape)==2#only for batch, features rank
        
        last_dim=input_shape[-1]

        input_shape = tf.TensorShape(input_shape)

        kernel_initializer=self.kernel_initializer

        if(type(kernel_initializer) is str):
            kernel_initializer=getattr(tf.keras.initializers, kernel_initializer)()

        self.kernel = self.add_weight('kernel',shape=[self.C, self.V, last_dim],initializer=kernel_initializer,dtype=tf.float32,trainable=self.trainable)

        self.built = True

You do not need to access the weights of the network at learning time in the loss. You only need to make the output of the network the necessary to plug in the loss. The output should be $$phi=frac{Mcdot x_i}{||M||cdot ||x_i||}$$. The call function would take the form:

    @tf.function
    def call(self, X):
        
        M=self.kernel

        norm_M=tf.expand_dims(tf.norm(M, ord=2,axis=2), axis=-1)
        norm_X=tf.norm(X, ord=2)

        #L_R reunion loss
        self.add_loss(lambda : tf.reduce_sum((-2/(self.V*(self.V-1)))*tf.einsum("CVD, CVD-> D", M/norm_M, M/norm_M)) )

        return tf.einsum("BD,CVD->BCV",X/norm_X, M/norm_M,)

Now you can implement directly the loss of equation (3), being the output of this network $$phi$$.

If you execute this code you get a tensor with the desired shape:

C=5
V=7

view_layer=DenseCustom(C,V)

x = tf.ones((10,4))

view_layer(x).shape

Output:
TensorShape([10, 5, 7])

Answered By: David Calhas
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.