In TensorFlow, why a m*n matrix can add n * 1 matrix?

Question:

I am very new to python and TensorFlow, recent days I met a problem when I study “MNIST For ML Beginners”(https://www.tensorflow.org/get_started/mnist/beginners).

In this tutorial, we use y = tf.nn.softmax(tf.matmul(X, W) + b) to get our outputs.

My question is, for example, X is a [100,784] matrix, and W is [784,10] matrix, b is a [10] tensor (like a [10,1] matrix?), after we called tf.matmul(X, W) we will get a [100,10] matrix. here is my question, how can a [100,10] matrix add a b[10] tensor here? It does not make any sense to me.

I know why there are biases and I know why the biases need to be added. But I just do not know how the “+” operator worked in this problem.

Asked By: Martin

||

Answers:

This is because of a concept called broadcasting which can be found in both Numpy and TensorFlow. At a high level, this is how it works:

Suppose you’re working with an op that supports broadcasting (eg + or *) and has 2 input tensors, X and Y. In order to assess whether the shapes of X and Y are compatible, the op will assess the dimensions in pairs starting at the right. Dimensions are considered compatible if:

  • They are equal
  • One of them is 1
  • One of them is missing

Applying these rules to the add operation (+) and your inputs of shape [100, 10] and [10]:

  • 10 and 10 are compatible
  • 100 and ‘missing’ are compatible

If the shapes are compatible and one of the dimensions of an input is 1 or missing, the op will essentially tile that input to match the shape of the other input.

In your example, the add op will effectively tile Y of shape [10] to shape [100, 10] before doing the addition.

See the Numpy documentation on broadcasting for more detailed information (https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html)

Answered By: Avishkar Bhoopchand