# Matrix multiplication on a pandas dataframe

## Question:

I have the following dataframe

```
import numpy as np
import pandas as pd
data = [[30, 19, 6], [12, 23, 14], [8, 18, 20]]
df = pd.DataFrame(data = data, index = ['A', 'B', 'C'], columns = ['Bulgary', 'Robbery', 'Car Theft'])
df['r_Total'] = df.iloc[:, :3].sum(axis=1)
df.loc['c_Total'] = df.iloc[:3, :].sum(axis=0)
df
```

Bulgary | Robbery | Car Theft | r_Total | |
---|---|---|---|---|

A | 30 | 19 | 6 | 55 |

B | 12 | 23 | 14 | 49 |

C | 8 | 18 | 20 | 46 |

c_Total | 50 | 60 | 40 | 150 |

Now, I want to obtain expectation matrix `E`

, where

```
E_{ij} = (ith_r_Total/Total)(jth_c_Total/Total)*Total
```

(eg: E_{00} = (55/150)*(50/150)*150.) How should I accomplish this?

Of course, there’s always a brute force method to iterate over i and j in two for-loops, but is there a smarter way of doing it?

Mathematically, `r_Total * c_Total / 150`

would give me the answer I want. It really boils down to how to do matrix multiplication on dataframe. I tried:

```
df['r_Total'].dot(df.loc['c_Total'])
```

But got the following error:

```
ValueError: matrices are not aligned
```

I also tried:

```
df['r_Total'].mul(df.loc['c_Total'])
```

But got

```
A NaN
B NaN
Bulgary NaN
C NaN
Car Theft NaN
Robbery NaN
Total 22500.0
Name: Total, dtype: float64
```

But this is not my expected output. It is 1 col x 1 row, so my expected output should be a 3×3 matrix.

## Answers:

To perform a matrix multiplication, the dimensions have to match. Since your expected output is of shape (3, 3), yet the two "matrices" are columns and rows, you’ll have to make them matrices of shape (3, 1) and (1, 3) so that the multiplication would have the desired shape. You can do that by using double brackets (`[[]]`

) when selecting.

To make the multiplication work, you’ll have to convert at least one of the dataframes into a numpy array (that’s what’s happening with `.values`

)

Also, to exclude the "_Total" rows, only select up to the last element.

Then divide by the last element in the dataframe which is 150 in your example.

```
res = df.loc[df.index[:-1], ['c_Total']].dot(df.loc[['r_Total']].values[:, :-1]).div(df.iat[-1,-1])
```