tensorflow GradientDescentOptimizer: Incompatible shapes between op input and calculated input gradient

Question:

The model worked well before optimization step. However, when I want to optimize my model, the error message showed up:

Incompatible shapes between op input and calculated input gradient.
Forward operation: softmax_cross_entropy_with_logits_sg_12. Input
index: 0. Original input shape: (16, 1). Calculated input gradient
shape: (16, 16)

the following is my code.

import tensorflow as tf;  
batch_size = 16
size = 400
labels  = tf.placeholder(tf.int32, batch_size)
doc_encode  = tf.placeholder(tf.float32, [batch_size, size])

W1 = tf.Variable(np.random.rand(size, 100), dtype=tf.float32, name='W1')
b1 = tf.Variable(np.zeros((100)), dtype=tf.float32, name='b1')

W2 = tf.Variable(np.random.rand(100, 1),dtype=tf.float32, name='W2')
b2 = tf.Variable(np.zeros((1)), dtype=tf.float32, name='b2')

D1 = tf.nn.relu(tf.matmul(doc_encode, W1) + b1)
D2 = tf.nn.selu(tf.matmul(D1, W2) + b2)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=D2))
optim = tf.train.GradientDescentOptimizer(0.01).minimize(cost, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
with tf.Session() as sess:  
    sess.run(tf.global_variables_initializer())
    _cost, _optim = sess.run([cost, optim], {labels:np.array([1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1]), doc_encode: np.random.rand(batch_size, size)})
Asked By: GoatWang

||

Answers:

Correct following things.

First,

Change placeholders input shape to this

X = tf.placeholder(tf.int32, shape=[None,400]
Y = tf.placeholder(tf.float32, shape=[None,1]

Why None because this gives you freedom of feeding any size. This is preferred because while training you want to use mini batch but while predicting or inference time, you will generally feed single thing. Marking it None, takes care of that.

Second,

Correct your weight initialization, you are feeding in random values, they may be negatives too. It is always recommended to initialize with slight positive value. (I see you are using relu as activation, the Gradient of which is zero for negative weight values, so those weights are never updated in Gradient descent, in other words those are useless weights)

Third,

Logits are result you obtain from W2*x + b2. And that tf.nn.softmax_cross.....(..) automatically applied softmax activation. So no need of SeLu for last layer.

Answered By: coder3101