Explain the speed difference between numpy's vectorized function application VS python's for loop

Question:

I was implementing a weighting system called TF-IDF on a set of 42000 images, each consisting 784 pixels. This is basically a 42000 by 784 matrix.

The first method I attempted made use of explicit loops and took more than 2 hours.

def tfidf(color,img_pix,img_total):
    if img_pix==0:
        return 0
    else:
        return color * np.log(img_total/img_pix)

...

result = np.array([])
for img_vec in data_matrix:
    double_vec = zip(img_vec,img_pix_vec)
    result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec])
    try:
        result = np.vstack((result,result_row))
    # first row will throw a ValueError since vstack accepts rows of same len
    except ValueError:
        result = result_row

The second method I attempted used numpy matrices and took less than 5 minutes. Note that data_matrix, img_pix_mat are both 42000 by 784 matrices while img_total is a scalar.

result = data_matrix * np.log(np.divide(img_total,img_pix_mat))

I was hoping someone could explain the immense difference in speed.

The authors of the following paper entitled "The NumPy array: a structure for eļ¬ƒcient numerical computation" (http://arxiv.org/pdf/1102.1523.pdf), state on the top of page 4 that they observe a 500 times speed increase due to vectorized computation. I’m presuming much of the speed increase I’m seeing is due to this. However, I would like to go a step further and ask why numpy vectorized computations are that much faster than standard python loops?

Also, perhaps you guys might know of other reasons why the first method is slow. Do try/except structures have high overhead? Or perhaps forming a new np.array for each loop is takes a long time?

Thanks.

Asked By: Kao

||

Answers:

The difference you’re seeing isn’t due to anything fancy like SSE vectorization. There are two primary reasons. The first is that NumPy is written in C, and the C implementation doesn’t have to go through the tons of runtime method dispatch and exception checking and so on that a Python implementation goes through.

The second reason is that even for Python code, your loop-based implementation is inefficient. You’re using vstack in a loop, and every time you call vstack, it has to completely copy all arrays you’ve passed to it. That adds an extra factor of len(data_matrix) to your asymptotic complexity.

Answered By: user2357112

Due to the internal workings of numpy, (as far as I know, numpy works with C internally, so everything you push down to numpy is actually much faster because it is in a different language)

Edit:

Taking out the zip, and replacing it with a vstack should make it faster too, (zip tends to go slow if the arguments are very large, and for that vstack is faster; additionally, vstack is numpy (thus C), while zip is python).

And yes, if I understood correctly – not sure about that- , you are doing 42k times a try/except block. That should definitely be bad for the speed.

Test:

T=numpy.ndarray((5,10))
for t in T:
    print t.shape

results in (10,)

This means that yes, if your matrices are 42k by 784 matrices, you are trying 42k times a try-except block. I am assuming that should put an effect in the computation times, as well as doing a zip each time, but not certain if that would be the main cause.

(So every one of your 42k times you run your stuff, it takes 0.17sec. I am quite certain that a try/except block doesn’t take 0.17 seconds, but maybe the overhead it causes or so does contribute to it?)

Try changing the following:

double_vec = zip(img_vec,img_pix_vec)
result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec])

to

result_row = np.array([tfidf(img_vec[i],img_pix_vec[i],img_total)
                       for i in xrange(len(img_vec))])

That, at least, gets rid of the zip statement. Not sure if the zip statement takes down your time by one minute or by nearly two hours (I know zip is slow, compared to numpy.vstack, but no clue if that would give you two hours time gain.)

Answered By: usethedeathstar