# In TensorFlow, why a m*n matrix can add n * 1 matrix?

## Question:

I am very new to python and TensorFlow, recent days I met a problem when I study “MNIST For ML Beginners”(https://www.tensorflow.org/get_started/mnist/beginners).

In this tutorial, we use `y = tf.nn.softmax(tf.matmul(X, W) + b)` to get our outputs.

My question is, for example, X is a [100,784] matrix, and W is [784,10] matrix, b is a  tensor (like a [10,1] matrix?), after we called tf.matmul(X, W) we will get a [100,10] matrix. here is my question, how can a [100,10] matrix add a b tensor here? It does not make any sense to me.

I know why there are biases and I know why the biases need to be added. But I just do not know how the “+” operator worked in this problem.

This is because of a concept called broadcasting which can be found in both Numpy and TensorFlow. At a high level, this is how it works:

Suppose you’re working with an op that supports broadcasting (eg + or *) and has 2 input tensors, X and Y. In order to assess whether the shapes of X and Y are compatible, the op will assess the dimensions in pairs starting at the right. Dimensions are considered compatible if:

• They are equal
• One of them is 1
• One of them is missing

Applying these rules to the add operation (+) and your inputs of shape [100, 10] and :

• 10 and 10 are compatible
• 100 and ‘missing’ are compatible

If the shapes are compatible and one of the dimensions of an input is 1 or missing, the op will essentially tile that input to match the shape of the other input.

In your example, the add op will effectively tile Y of shape  to shape [100, 10] before doing the addition.