numpy np.array versus np.matrix (performance)

Question:

often when working with numpy I find the distinction annoying – when I pull out a vector or a row from a matrix and then perform operations with np.arrays there are usually problems.

to reduce headaches, I’ve taken to sometimes just using np.matrix (converting all np.arrays to np.matrix) just for simplicity. however, I suspect there are some performance implications. could anyone comment as to what those might be and the reasons why?

it seems like if they are both just arrays underneath the hood that element access is simply an offset calculation to get the value, so I’m not sure without reading through the entire source what the difference might be.

more specifically, what performance implications does this have:

v = np.matrix([1, 2, 3, 4])
# versus the below
w = np.array([1, 2, 3, 4])

thanks

Asked By: lollercoaster

||

Answers:

There is a general discusion on SciPy.org and on this question.

To compare performance, I did the following in iPython. It turns out that arrays are significantly faster.

In [1]: import numpy as np
In [2]: %%timeit
   ...: v = np.matrix([1, 2, 3, 4])
100000 loops, best of 3: 16.9 us per loop

In [3]: %%timeit
   ...: w = np.array([1, 2, 3, 4])
100000 loops, best of 3: 7.54 us per loop

Therefore numpy arrays seem to have faster performance than numpy matrices.

Versions used:

Numpy: 1.7.1

IPython: 0.13.2

Python: 2.7

Answered By: Lee

I added some more tests, and it appears that an array is considerably faster than matrix when array/matrices are small, but the difference gets smaller for larger data structures:

Small (4×4):

In [11]: a = [[1,2,3,4],[5,6,7,8]]

In [12]: aa = np.array(a)

In [13]: ma = np.matrix(a)

In [14]: %timeit aa.sum()
1000000 loops, best of 3: 1.77 us per loop

In [15]: %timeit ma.sum()
100000 loops, best of 3: 15.1 us per loop

In [16]: %timeit np.dot(aa, aa.T)
1000000 loops, best of 3: 1.72 us per loop

In [17]: %timeit ma * ma.T
100000 loops, best of 3: 7.46 us per loop

Larger (100×100):

In [19]: aa = np.arange(10000).reshape(100,100)

In [20]: ma = np.matrix(aa)

In [21]: %timeit aa.sum()
100000 loops, best of 3: 9.18 us per loop

In [22]: %timeit ma.sum()
10000 loops, best of 3: 22.9 us per loop

In [23]: %timeit np.dot(aa, aa.T)
1000 loops, best of 3: 1.26 ms per loop

In [24]: %timeit ma * ma.T
1000 loops, best of 3: 1.24 ms per loop

Notice that matrices are actually slightly faster for multiplication.

I believe that what I am getting here is consistent with what @Jaime is explaining the comment.

Answered By: Akavall