No broadcasting for dot product

Question:

I tried this simple example in Python.

import numpy as np

a = np.array([1,2,3,4])
b = np.array([20])
a + b # broadcasting takes place! 
np.dot(a,b) # no broadcasting here?! 

I thought np.dot also uses broadcasting, but it seems it doesn’t.

I wonder why i.e. what is the philosophy behind this behavior?

Which operations in NumPy use broadcasting and which not?

Is there another version of the dot function for dot product,
which actually uses broadcasting?

Asked By: peter.petrov

||

Answers:

The reason it doesn’t broadcast is because the docs say so. However, that’s not a very good, or satisfying, answer to the question. So perhaps I can shed some light on why.

The point of broadcasting is to take operators and apply them pointwise to different shapes of data without the programmer having to explicitly write for loops all the time.

print(a + b)

is way shorter and just as readable as

my_new_list = []
for a_elem, b_elem in zip(a, b):
    my_new_list.append(a_elem + b_elem)
print(my_new_list)

The reason it works for +, and -, and all of those operators is, and I’m going to borrow some terminology from J here, that they’re rank 0. What that means is that, in the absence of any broadcasting rules, + is intended to operate on scalars, i.e. ordinary numbers. The original point of the + operator is to act on numbers, so Numpy comes along and extends that rank 0 behavior to higher ranks, allowing it to work on vectors (rank 1) and matrices (rank 2) and tensors (rank 3 and beyond). Again, I’m borrowing J terminology here, but the concept is the same in Numpy.

Now, the fundamental difference is that dot doesn’t work that way. The dot product function, in Numpy at least, is already special-cased to do different things for different rank arguments. For rank 1 vectors, it performs an inner product, what we usually call a "dot product" in a beginner calculus course. For rank 2 vectors, it acts like matrix multiplication. For higher-rank vectors, it’s an appropriate generalization of matrix multiplication that you can read about in the docs linked above. But the point is that dot already works for all ranks. It’s not an atomic operation, so we can’t meaningfully broadcast it.

If dot was specialized to only work on rank 1 vectors, and it only performed the beginner calculus rank 1 inner product, then we would call it a rank 1 operator, and it could be broadcast over higher-rank tensors. So, for instance, this hypothetical dot function, which is designed to work on two arguments, each of shape (n,), could be applied to two arguments of shape (n, m) and (n, m), where the operation would be applied pointwise to each row. But Numpy’s dot has different behavior. They decided (and probably rightly so) that dot should handle its own "broadcasting"-like behavior because it can do something smarter than just apply the operation pointwise.

Answered By: Silvio Mayolo

Your 2 arrays and their shapes:

In [21]: a = np.array([1,2,3,4])
    ...: b = np.array([20])    
In [22]: a.shape, b.shape
Out[22]: ((4,), (1,))

By rules of broadcasting, for a binary operator like times or add, the (1,) broadcasts to (4,), and it does element-wise operation:

In [23]: a*b
Out[23]: array([20, 40, 60, 80])

dot raises this error:

In [24]: np.dot(a,b)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 np.dot(a,b)

File <__array_function__ internals>:5, in dot(*args, **kwargs)

ValueError: shapes (4,) and (1,) not aligned: 4 (dim 0) != 1 (dim 0)

For 1d arrays dot expects an exact match in shapes; as in np.dot(a,a) to the ‘dot product’ of a – sum of its elements squared. It does not expand the (1,) to (4,) as with broadcasting. And that fits the usual expectations of a linear algebra inner product. Similarly with 2d, a (n,m) works with a (m,k) to produce a (n,k). The last of A must match the 2nd to the last of B. Again basic matrix multiplication action. It does a sum-of-products on the shared m dimension.

Expanding a to (4,1), allows it to pair with the (1,) to produce a (4,). That’s not broadcasting. The 1 is the sum-of-products dimension.

In [25]: np.dot(a[:,None],b)
Out[25]: array([20, 40, 60, 80])

dot also works with a scalar b – again that’s documented.

In [26]: np.dot(a,20)
Out[26]: array([20, 40, 60, 80])

np.dot docs mention the np.matmul/@ alternative several times. matmul behaves the same for 1 and 2d, though its explanation is bit different. It doesn’t accept the scalar argument.

Answered By: hpaulj

I think the simple answer in less-technical terms is that array broadcasting only makes sense for element-wise operations such as +, -, *, /, **.

Maybe this is what they mean by "arithmetic operations" in the documentation:

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations

I agree it would be nice if they were more explicit about which operators allow broadcasting.

The important characteristic of element-wise operations is that both arrays must be the same size. This makes broadcasting behaviour easier to predict because it should always do the obvious thing to make the sizes match.

For operators that take a and b of different sizes, it may not be clear at all what broadcasting should do. Indeed, there may be more than one possible expected result that may seem obvious.

For example,

a = np.array([[1, 2, 3]])
b = np.array([[10], [20], [30]])
print(a + b)

# [[11 12 13]
#  [21 22 23]
#  [31 32 33]]

This is quite clear.

But what should the result be if np.dot used broadcasting?:

np.dot(a, b)

# array([[140]])  # this is the actual result

# or
np.dot(a, np.repeat(b, 3, 1))

# array([[140, 140, 140]])  # with broadcasting of b

# or 
np.dot(np.repeat(a, 3, 0), b)

# array([[140],
#        [140],
#        [140]])  # with broadcasting of a

# or
np.dot(np.repeat(a, 3, 0), np.repeat(b, 3, 1))

# array([[140, 140, 140],
#        [140, 140, 140],
#        [140, 140, 140]])  # with broadcasting of both
Answered By: Bill
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.