Difference between np.dot and np.multiply with np.sum in binary cross-entropy loss calculation

Question:

I have tried the following code but didn’t find the difference between np.dot and np.multiply with np.sum

Here is np.dot code

``````logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
print(logprobs.shape)
print(logprobs)
cost = (-1/m) * logprobs
print(cost.shape)
print(type(cost))
print(cost)
``````

Its output is

``````(1, 1)
[[-2.07917628]]
(1, 1)
<class 'numpy.ndarray'>
[[ 0.693058761039 ]]
``````

Here is the code for np.multiply with np.sum

``````logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
print(logprobs.shape)
print(logprobs)
cost = - logprobs / m
print(cost.shape)
print(type(cost))
print(cost)
``````

Its output is

``````()
-2.07917628312
()
<class 'numpy.float64'>
0.693058761039
``````

I’m unable to understand the type and shape difference whereas the result value is same in both cases

Even in the case of squeezing former code cost value become same as later but type remains same

``````cost = np.squeeze(cost)
print(type(cost))
print(cost)
``````

output is

``````<class 'numpy.ndarray'>
0.6930587610394646
``````

`np.dot` is the dot product of two matrices.

``````|A B| . |E F| = |A*E+B*G A*F+B*H|
|C D|   |G H|   |C*E+D*G C*F+D*H|
``````

Whereas `np.multiply` does an element-wise multiplication of two matrices.

``````|A B| ⊙ |E F| = |A*E B*F|
|C D|   |G H|   |C*G D*H|
``````

When used with `np.sum`, the result being equal is merely a coincidence.

``````>>> np.dot([[1,2], [3,4]], [[1,2], [2,3]])
array([[ 5,  8],
[11, 18]])
>>> np.multiply([[1,2], [3,4]], [[1,2], [2,3]])
array([[ 1,  4],
[ 6, 12]])

>>> np.sum(np.dot([[1,2], [3,4]], [[1,2], [2,3]]))
42
>>> np.sum(np.multiply([[1,2], [3,4]], [[1,2], [2,3]]))
23
``````

If `Y` and `A2` are (1,N) arrays, then `np.dot(Y,A.T)` will produce a (1,1) result. It is doing a matrix multiplication of a (1,N) with a (N,1). The `N's` are summed, leaving the (1,1).

With `multiply` the result is (1,N). Sum all values, and the result is a scalar.

If `Y` and `A2` were (N,) shaped (same number of elements, but 1d), the `np.dot(Y,A2)` (no `.T`) would also produce a scalar. From `np.dot` documentation:

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors

Returns the dot product of a and b. If a and b are both scalars or both 1-D arrays then a scalar is returned; otherwise an array is returned.

`squeeze` reduces all size 1 dimensions, but still returns an array. In `numpy` an array can have any number of dimensions (from 0 to 32). So a 0d array is possible. Compare the shape of `np.array(3)`, `np.array([3])` and `np.array([[3]])`.

What you’re doing is calculating the binary cross-entropy loss which measures how bad the predictions (here: `A2`) of the model are when compared to the true outputs (here: `Y`).

Here is a reproducible example for your case, which should explain why you get a scalar in the second case using `np.sum`

``````In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])

In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])

In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)

# `np.dot` returns 2D array since its arguments are 2D arrays
In [91]: logprobs
Out[91]: array([[-0.78914626]])

In [92]: cost = (-1/m) * logprobs

In [93]: cost
Out[93]: array([[ 0.09864328]])

In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))

# np.sum returns scalar since it sums everything in the 2D array
In [95]: logprobs
Out[95]: -0.78914625761870361
``````

Note that the `np.dot` sums along only the inner dimensions which match here `(1x8) and (8x1)`. So, the `8`s will be gone during the dot product or matrix multiplication yielding the result as `(1x1)` which is just a scalar but returned as 2D array of shape `(1,1)`.

Also, most importantly note that here `np.dot` is exactly same as doing `np.matmul` since the inputs are 2D arrays (i.e. matrices)

``````In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)

In [108]: logprobs
Out[108]: array([[-0.78914626]])

In [109]: logprobs.shape
Out[109]: (1, 1)
``````

Return result as a scalar value

`np.dot` or `np.matmul` returns whatever the resulting array shape would be, based on input arrays. Even with `out=` argument it’s not possible to return a scalar, if the inputs are 2D arrays. However, we can use `np.asscalar()` on the result to convert it to a scalar if the result array is of shape `(1,1)` (or more generally a scalar value wrapped in an nD array)

``````In [123]: np.asscalar(logprobs)
Out[123]: -0.7891462576187036

In [124]: type(np.asscalar(logprobs))
Out[124]: float
``````

ndarray of size 1 to scalar value

``````In [127]: np.asscalar(np.array([[[23.2]]]))
Out[127]: 23.2

In [128]: np.asscalar(np.array([[[[23.2]]]]))
Out[128]: 23.2
``````
``````In this example it just not a coincidence. Lets take an example we have two (1,3) and (1,3) matrices.
// Lets code

import numpy as np

x1=np.array([1, 2, 3]) // first array
x2=np.array([3, 4, 3]) // second array

//Then

X_Res=np.sum(np.multiply(x1,x2))
// will result 20 as it will be calculated as - (1*3)+(2*4)+(3*3) , i.e element wise
// multiplication followed by sum.

Y_Res=np.dot(x1,x2.T)

// in order to get (1,1) matrix) from a dot of (1,3) matrix and //(1,3) matrix we need to //transpose second one.
//Hence|1 2 3| * |3|
//               |4| = |1*3+2*4+3*3| = |20|
//               |3|
// will result 20 as it will be (1*3)+(2*4)+(3*3) , i.e. dot product of two matrices

print X_Res //20

print Y_Res //20
``````
Categories: questions
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.