# Difference between np.dot and np.multiply with np.sum in binary cross-entropy loss calculation

## Question:

I have tried the following code but didn’t find the difference between **np.dot** and **np.multiply with np.sum**

Here is **np.dot** code

```
logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
print(logprobs.shape)
print(logprobs)
cost = (-1/m) * logprobs
print(cost.shape)
print(type(cost))
print(cost)
```

Its output is

```
(1, 1)
[[-2.07917628]]
(1, 1)
<class 'numpy.ndarray'>
[[ 0.693058761039 ]]
```

Here is the code for **np.multiply with np.sum**

```
logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
print(logprobs.shape)
print(logprobs)
cost = - logprobs / m
print(cost.shape)
print(type(cost))
print(cost)
```

Its output is

```
()
-2.07917628312
()
<class 'numpy.float64'>
0.693058761039
```

I’m unable to understand the type and shape difference whereas the result value is same in both cases

Even in the case of squeezing former code **cost value become same as later but type remains same**

```
cost = np.squeeze(cost)
print(type(cost))
print(cost)
```

output is

```
<class 'numpy.ndarray'>
0.6930587610394646
```

## Answers:

`np.dot`

is the dot product of two matrices.

```
|A B| . |E F| = |A*E+B*G A*F+B*H|
|C D| |G H| |C*E+D*G C*F+D*H|
```

Whereas `np.multiply`

does an element-wise multiplication of two matrices.

```
|A B| ⊙ |E F| = |A*E B*F|
|C D| |G H| |C*G D*H|
```

When used with `np.sum`

, the result being equal is merely a coincidence.

```
>>> np.dot([[1,2], [3,4]], [[1,2], [2,3]])
array([[ 5, 8],
[11, 18]])
>>> np.multiply([[1,2], [3,4]], [[1,2], [2,3]])
array([[ 1, 4],
[ 6, 12]])
>>> np.sum(np.dot([[1,2], [3,4]], [[1,2], [2,3]]))
42
>>> np.sum(np.multiply([[1,2], [3,4]], [[1,2], [2,3]]))
23
```

If `Y`

and `A2`

are (1,N) arrays, then `np.dot(Y,A.T)`

will produce a (1,1) result. It is doing a matrix multiplication of a (1,N) with a (N,1). The `N's`

are summed, leaving the (1,1).

With `multiply`

the result is (1,N). Sum all values, and the result is a scalar.

If `Y`

and `A2`

were (N,) shaped (same number of elements, but 1d), the `np.dot(Y,A2)`

(no `.T`

) would also produce a scalar. From `np.dot`

documentation:

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors

Returns the dot product of a and b. If a and b are both scalars or both 1-D arrays then a scalar is returned; otherwise an array is returned.

`squeeze`

reduces all size 1 dimensions, but still returns an array. In `numpy`

an array can have any number of dimensions (from 0 to 32). So a 0d array is possible. Compare the shape of `np.array(3)`

, `np.array([3])`

and `np.array([[3]])`

.

What you’re doing is calculating the **binary cross-entropy loss** which measures how bad the predictions (here: `A2`

) of the model are when compared to the true outputs (here: `Y`

).

Here is a reproducible example for your case, which should explain why you get a scalar in the second case using `np.sum`

```
In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])
In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])
In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
# `np.dot` returns 2D array since its arguments are 2D arrays
In [91]: logprobs
Out[91]: array([[-0.78914626]])
In [92]: cost = (-1/m) * logprobs
In [93]: cost
Out[93]: array([[ 0.09864328]])
In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
# np.sum returns scalar since it sums everything in the 2D array
In [95]: logprobs
Out[95]: -0.78914625761870361
```

Note that the `np.dot`

sums along *only the inner dimensions* which match here `(1x8) and (8x1)`

. So, the `8`

s will be gone during the dot product or matrix multiplication yielding the result as `(1x1)`

which is just a *scalar* but returned as 2D array of shape `(1,1)`

.

Also, most importantly note that here `np.dot`

is **exactly same** as doing `np.matmul`

since the inputs are 2D arrays (i.e. matrices)

```
In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)
In [108]: logprobs
Out[108]: array([[-0.78914626]])
In [109]: logprobs.shape
Out[109]: (1, 1)
```

### Return result as a *scalar* value

`np.dot`

or `np.matmul`

returns whatever the resulting array shape would be, based on input arrays. Even with `out=`

argument it’s not possible to return a *scalar*, if the inputs are 2D arrays. However, we can use `np.asscalar()`

on the result to convert it to a scalar if the result array is of shape `(1,1)`

(or more generally a *scalar* value wrapped in an nD array)

```
In [123]: np.asscalar(logprobs)
Out[123]: -0.7891462576187036
In [124]: type(np.asscalar(logprobs))
Out[124]: float
```

ndarrayof size 1 toscalarvalue

```
In [127]: np.asscalar(np.array([[[23.2]]]))
Out[127]: 23.2
In [128]: np.asscalar(np.array([[[[23.2]]]]))
Out[128]: 23.2
```

```
In this example it just not a coincidence. Lets take an example we have two (1,3) and (1,3) matrices.
// Lets code
import numpy as np
x1=np.array([1, 2, 3]) // first array
x2=np.array([3, 4, 3]) // second array
//Then
X_Res=np.sum(np.multiply(x1,x2))
// will result 20 as it will be calculated as - (1*3)+(2*4)+(3*3) , i.e element wise
// multiplication followed by sum.
Y_Res=np.dot(x1,x2.T)
// in order to get (1,1) matrix) from a dot of (1,3) matrix and //(1,3) matrix we need to //transpose second one.
//Hence|1 2 3| * |3|
// |4| = |1*3+2*4+3*3| = |20|
// |3|
// will result 20 as it will be (1*3)+(2*4)+(3*3) , i.e. dot product of two matrices
print X_Res //20
print Y_Res //20
```