When is the watch() function required in Tensorflow to enable tracking of gradients?

Question:

I’m puzzled by the fact that I’ve seen blocks of code that require tf.GradientTape().watch() to work and blocks that seem to work without it.

For example, this block of code requires the watch() function:

 x = tf.ones((2,2)) #Note: In original posting of question, 
                    #      I didn't include this key line.
with tf.GradientTape() as t:
    # Record the actions performed on tensor x with `watch`
   
    t.watch(x) 

    # Define y as the sum of the elements in x
    y =  tf.reduce_sum(x)

    # Let z be the square of y
    z = tf.square(y) 

# Get the derivative of z wrt the original input tensor x
dz_dx = t.gradient(z, x)

But, this block does not:

with tf.GradientTape() as tape:
    logits = model(images, images = True)
    loss_value = loss_object(labels, logits)
    

loss_history.append(loss_value.numpy.mean()
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradient(zip(grads, model.trainable_vaiables))

What is the difference between these two cases?

Asked By: user1245262

||

Answers:

The documentation of watch says the following:

Ensures that tensor is being traced by this tape.

Any trainable variable that is accessed within the context of tape is watched by default. This means we can calculate the gradient with respect to that trainable variable by calling t.gradient(loss, variable). Check the following example:

 def grad(model, inputs, targets):
     with tf.GradientTape() as tape:
        loss_value = loss(model, inputs, targets, training=True)
     return loss_value, tape.gradient(loss_value, model.trainable_variables)

Hence no need to use tape.watch in the above code. But sometimes we need to calculate gradients with respect to some non trainable variable. In those cases we need to watch them.

with tf.GradientTape() as t:
     t.watch(images)
     predictions = cnn_model(images)
     loss = tf.keras.losses.categorical_crossentropy(expected_class_output, predictions)
gradients = t.gradient(loss, images)

In the above code, images is input to model and not a trainable variable. I need to calculate the gradient of loss with respect to the images and hence I need to watch it.

Answered By: MSS

Although the answer and comment given above were helpful, I think this code most clearly illustrates what I needed to know.

# Define a 2x2 array of 1's
x = tf.ones((2,2))
x1 = tf.Variable([[1,1],[1,1]],name='x1',dtype=tf.float32)

with tf.GradientTape(persistent=True) as t:
    # Record the actions performed on tensor x with `watch`

    # Define y as the sum of the elements in x
    y =  tf.reduce_sum(x) + tf.reduce_sum(x1) 

    # Let z be the square of y
    z = tf.square(y) 

# Get the derivative of z wrt the original input tensor x
dz_dx = t.gradient(z, x)
dz_dx1 = t.gradient(z, x1)

# Print results
print("dz_dx =",dz_dx)
print("dz_dx1 =", dz_dx1)
print("type(x) =", type(x))
print("type(x1) =", type(x1))

which gives,

dz_dx = None
dz_dx1 = tf.Tensor(
[[16. 16.]
 [16. 16.]], shape=(2, 2), dtype=float32)
type(x) = <class 'tensorflow.python.framework.ops.EagerTensor'>
type(x1) = <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable'>

In Tensorflow, all variables are are given the property Trainable=True by default. So, they are automatically watched. I neglected (out of ignorance) to originally indicate that x was not a variable, but instead, was an EagerTensor, which is not watched by default.

model.trainable_variables is a list of trainable variables, which are watched by gradient tape, whereas a regular Tensor is not watched without specifically telling the tape to watch it.

Answered By: user1245262
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.