Eager execution: gradient computation
Question:
I m wondering why is this very simple gradient computation not working correctly. It is actually generating a [None, None] vector. Obviously, this is not the desired output.
import tensorflow as tf
tf.enable_eager_execution()
a = tf.constant(0.)
with tf.GradientTape() as tape:
b = 2 * a
da, db = tape.gradient(a + b, [a, b])
print(da)
print(db)
Expected Output:
da = 3 and db = 1
Answers:
There are two minor issues with the code snippet you posted:
-
The a + b
computation is happening outside the context of the tape, so it is not being recorded. Note that GradientTape
can only differentiate computation that is recorded. Computing a + b
inside the tape context will fix that.
-
Source tensors need to be “watched”. There are two ways to signal to the tape that a tensor should be watched: (a) explicitly invoking tape.watch
, or (b) Using a tf.Variable
(all variables are watched), see documentation
Long story short, two trivial modifications to your snippet do the trick:
import tensorflow as tf
tf.enable_eager_execution()
a = tf.constant(0.)
with tf.GradientTape() as tape:
tape.watch(a)
b = 2 * a
c = a + b
da, db = tape.gradient(c, [a, b])
print(da)
print(db)
Hope that helps.
I m wondering why is this very simple gradient computation not working correctly. It is actually generating a [None, None] vector. Obviously, this is not the desired output.
import tensorflow as tf
tf.enable_eager_execution()
a = tf.constant(0.)
with tf.GradientTape() as tape:
b = 2 * a
da, db = tape.gradient(a + b, [a, b])
print(da)
print(db)
Expected Output:
da = 3 and db = 1
There are two minor issues with the code snippet you posted:
-
The
a + b
computation is happening outside the context of the tape, so it is not being recorded. Note thatGradientTape
can only differentiate computation that is recorded. Computinga + b
inside the tape context will fix that. -
Source tensors need to be “watched”. There are two ways to signal to the tape that a tensor should be watched: (a) explicitly invoking
tape.watch
, or (b) Using atf.Variable
(all variables are watched), see documentation
Long story short, two trivial modifications to your snippet do the trick:
import tensorflow as tf
tf.enable_eager_execution()
a = tf.constant(0.)
with tf.GradientTape() as tape:
tape.watch(a)
b = 2 * a
c = a + b
da, db = tape.gradient(c, [a, b])
print(da)
print(db)
Hope that helps.