Python optimisations in this code?

Question

I have two fairly simple code snippets and I’m running both of them a very large amount of times; I’m trying to determine if there’s any optimisation I can do to speed up the execution time. If there’s anything that stands out as something that could be done a lot quicker…

In the first one, we’ve got a list, fields. We’ve also got a list of lists, weights. We’re trying to find which weight list multiplied by fields will produce the maximum sum. Fields is about 30k entries long.

def find_best(weights,fields):
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += float(fields[i]) * weights[c][i]
    if score > best:
      best = score
      winner = c
  return winner

In the second one, we’re trying to update two of our weight lists; one gets increased and one decreased. The amount to increase/decrease each element in the is equal to the corresponding element in fields (e.g. if fields[4] = 10.5, then we want to increase weights[toincrease][4] by 10.5 and decrease weights[todecrease][4] by 10.5)

 def update_weights(weights,fields,toincrease,todecrease):
   for i in range(num_fields):
     update = float(fields[i])
     weights[toincrease][i] += update
     weights[todecrease][i] -= update
   return weights

I hope this isn’t an overly specific question.

Asked By: Fergusmac

||

Source

Answer 1

If you are running Python 2.x I would use xrange() rather than range(), uses less memory as it doesn’t generate a list

This is assuming you want to keep the current code structure.

Answered By: Levon

Answer 2

An easy optimisation is to use xrange instead of range. xrange is a generator function that yields results one by one as you iterate over it; whereas range first creates the entire (30,000 item) list as a temporary object, using more memory and CPU cycles.

Answered By: Preet Kukreti

Answer 3

I think you could get a pretty big speed boost using numpy. Stupidly simple example:

>>> fields = numpy.array([1, 4, 1, 3, 2, 5, 1])
>>> weights = numpy.array([[.2, .3, .4, .2, .1, .5, .9], [.3, .1, .1, .9, .2, .4, .5]])
>>> fields * weights
array([[ 0.2,  1.2,  0.4,  0.6,  0.2,  2.5,  0.9],
       [ 0.3,  0.4,  0.1,  2.7,  0.4,  2. ,  0.5]])
>>> result = _
>>> numpy.argmax(numpy.sum(result, axis=1))
1
>>> result[1]
array([ 0.3,  0.4,  0.1,  2.7,  0.4,  2. ,  0.5])

Answered By: Nolen Royalty

Answer 4

First, if you are using Python 2.x, you can gain some speed by using xrange() instead of range(). In Python 3.x there is no xrange(), but the built-in range() is basically the same as xrange().

Next, if we are going for speed, we need to write less code, and rely more on Python’s built-in features (that are written in C for speed).

You could speed things up by using a generator expression inside of sum() like so:

from itertools import izip

def find_best(weights,fields):
    winner = -1
    best = -float('inf')
    for c in xrange(num_category):
        score = sum(float(t[0]) * t[1] for t in izip(fields, weights[c]))
        if score > best:
            best = score
            winner = c
    return winner

Applying the same idea again, let’s try to use max() to find the best result. I think this code is ugly to look at, but if you benchmark it and it’s enough faster, it might be worth it:

from itertools import izip

def find_best(weights, fields):
    tup = max(
        ((i, sum(float(t[0]) * t[1] for t in izip(fields, wlist))) for i, wlist in enumerate(weights)),
        key=lambda t: t[1]
    )
    return tup[0]

Ugh! But if I didn’t make any mistakes, this does the same thing, and it should rely a lot on the C machinery in Python. Measure it and see if it is faster.

So, we are calling max(). We are giving it a generator expression, and it will find the max value returned from the generator expression. But you want the index of the best value, so the generator expression returns a tuple: index and weight value. So we need to pass the generator expression as the first argument, and the second argument must be a key function that looks at the weight value from the tuple and ignores the index. Since the generator expression is not the only argument to max() it needs to be in parens. Then it builds a tuple of i and the calculated weight, calculated by the same sum() we used above. Finally once we get back a tuple from max() we index it to get the index value, and return that.

We can make this much less ugly if we break out a function. This adds the overhead of a function call, but if you measure it I’ll bet it isn’t too much slower. Also, now that I think about it, it makes sense to build a list of fields values already pre-coerced to float; then we can use that multiple times. Also, instead of using izip() to iterate over two lists in parallel, let’s just make an iterator and explicitly ask it for values. In Python 2.x we use the .next() method function to ask for a value; in Python 3.x you would use the next() built-in function.

def fweight(field_float_list, wlist):
    f = iter(field_float_list)
    return sum(f.next() * w for w in wlist)

def find_best(weights, fields):
    flst = [float(x) for x in fields]
    tup = max(
        ((i, fweight(flst, wlist)) for i, wlist in enumerate(weights)),
        key=lambda t: t[1]
    )
    return tup[0]

If there are 30K fields values, then pre-computing the float() values is likely to be a big speed win.

EDIT: I missed one trick. Instead of the lambda function, I should have used operator.itemgetter() like some of the code in the accepted answer. Also, the accepted answer timed things, and it does look like the overhead of the function call was significant. But the Numpy answers were so much faster that it’s not worth playing with this answer anymore.

As for the second part, I don’t think it can be sped up very much. I’ll try:

def update_weights(weights,fields,toincrease,todecrease):
    w_inc = weights[toincrease]
    w_dec = weights[todecrease]
    for i, f in enumerated(fields):
        f = float(f)  # see note below
        w_inc[i] += f
        w_dec[i] -= f

So, instead of iterating over an xrange(), here we just iterate over the fields values directly. We have a line that coerces to float.

Note that if the weights values are already float, we don’t really need to coerce to float here, and we can save time by just deleting that line.

Your code was indexing the weights list four times: twice to do the increment, twice to do the decrement. This code does the first index (using the toincrease or todecrease) argument just once. It still has to index by i in order for += to work. (My first version tried to avoid this with an iterator and didn’t work. I should have tested before posting. But it’s fixed now.)

One last version to try: instead of incrementing and decrementing values as we go, just use list comprehensions to build a new list with the values we want:

def update_weights(weights, field_float_list, toincrease, todecrease):
    f = iter(field_float_list)
    weights[toincrease] = [x + f.next() for x in weights[toincrease]]
    f = iter(field_float_list)
    weights[todecrease] = [x - f.next() for x in weights[todecrease]]

This assumes you have already coerced all the fields values to float, as shown above.

Is it faster, or slower, to replace the whole list this way? I’m going to guess faster, but I’m not sure. Measure and see!

Oh, I should add: note that my version of update_weights() shown above does not return weights. This is because in Python it is considered a good practice to not return a value from a function that mutates a data structure, just to make sure that nobody ever gets confused about which functions do queries and which functions change things.

http://en.wikipedia.org/wiki/Command-query_separation

Measure measure measure. See how much faster my suggestions are, or are not.

Answered By: steveha

Answer 5

As @Levon says, xrange() in python2.x is a must. Also, if you are in python2.4+ you can use generator expression (thanks @steveha) , which kinda work like list comprehensions (only in 2.6+), for your inner loop as simply as follows:

for i in range(num_fields):
      score += float(fields[i]) * weights[c][i]

equivalent to

score = sum(float(fields[i]) * weights[c][i]) for i in num_fields)

Also in general, there is this great page on the python wiki about simple but effective
optimizations tricks!

Answered By: Zenon

Answer 6

When you are trying to optimise, the thing you have to do is profile and measure! Python provides the timeit module which makes measuring things easy!

This will assume that you’ve converted fields to a list of floats beforehand (outside any of these functions), since the string → float conversion is very slow. You can do this via fields = [float(f) for f in string_fields].

Also, for doing numerical processing, pure python isn’t very good, since it ends up doing a lot of type-checking (and some other stuff) for each operation. Using a C library like numpy will give massive improvements.

find_best

I have incorporated the answers of others (and a few more) into a profiling suite (say, test_find_best.py):

import random, operator, numpy as np, itertools, timeit

fields = [random.random() for _ in range(3000)]
fields_string = [str(field) for field in fields]
weights = [[random.random() for _ in range(3000)] for c in range(100)]

npw = np.array(weights)
npf = np.array(fields)   

num_fields = len(fields)
num_category = len(weights)

def f_original():
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += float(fields_string[i]) * weights[c][i]
    if score > best:
      best = score
      winner = c
  
def f_original_no_string():
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += fields[i] * weights[c][i]
    if score > best:
      best = score
      winner = c
      
def f_original_xrange():
  winner = -1
  best = -float('inf')
  for c in xrange(num_category):
    score = 0
    for i in xrange(num_fields):
      score += fields[i] * weights[c][i]
    if score > best:
      best = score
      winner = c


# Zenon  http://stackoverflow.com/a/10134298/1256624

def f_index_comprehension():
    winner = -1
    best = -float('inf')
    for c in range(num_category):
      score = sum(fields[i] * weights[c][i] for i in xrange(num_fields))
      if score > best:
        best = score
        winner = c  


# steveha  http://stackoverflow.com/a/10134247/1256624

def f_comprehension():
  winner = -1
  best = -float('inf')

  for c in xrange(num_category):
    score = sum(f * w for f, w in itertools.izip(fields, weights[c]))
    if score > best:
      best = score
      winner = c

def f_schwartz_original(): # https://en.wikipedia.org/wiki/Schwartzian_transform
    tup = max(((i, sum(t[0] * t[1] for t in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)),
              key=lambda t: t[1]
             )

def f_schwartz_opt(): # https://en.wikipedia.org/wiki/Schwartzian_transform
    tup = max(((i, sum(f * w for f,w in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)),
              key=operator.itemgetter(1)
             )

def fweight(field_float_list, wlist):
    f = iter(field_float_list)
    return sum(f.next() * w for w in wlist)
        
def f_schwartz_iterate():
     tup = max(
         ((i, fweight(fields, wlist)) for i, wlist in enumerate(weights)),
         key=lambda t: t[1]
      )
                                        
# Nolen Royalty  http://stackoverflow.com/a/10134147/1256624 
                           
def f_numpy_mult_sum():
   np.argmax(np.sum(npf * npw, axis = 1))


# me

def f_imap():
  winner = -1
  best = -float('inf')

  for c in xrange(num_category):
    score = sum(itertools.imap(operator.mul, fields, weights[c]))
    if score > best:
      best = score
      winner = c

def f_numpy():
   np.argmax(npw.dot(npf))



for f in [f_original,
          f_index_comprehension,
          f_schwartz_iterate,
          f_original_no_string,
          f_schwartz_original,
          f_original_xrange,
          f_schwartz_opt,
          f_comprehension,
          f_imap]:
   print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=10)/10 * 1000)
for f in [f_numpy_mult_sum, f_numpy]:
   print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=100)/100 * 1000)

Running python test_find_best.py gives me:

f_original: 310.34 ms
f_index_comprehension: 102.58 ms
f_schwartz_iterate: 103.39 ms
f_original_no_string: 96.36 ms
f_schwartz_original: 90.52 ms
f_original_xrange: 89.31 ms
f_schwartz_opt: 69.48 ms
f_comprehension: 68.87 ms
f_imap: 53.33 ms
f_numpy_mult_sum: 3.57 ms
f_numpy: 0.62 ms

So the numpy version using .dot (sorry, I can’t find the documentation for it atm) is the fastest. If you are doing a lot of numerical operations (which it seems you are), it might be worth converting fields and weights as numpy arrays as soon as you create them.

update_weights

Numpy is likely to offer a similar speed-up for update_weights, doing something like:

def update_weights(weights, fields, to_increase, to_decrease):
  weights[to_increase,:] += fields
  weights[to_decrease,:] -= fields
  return weights

(I haven’t tested or profiled that btw, you need to do that.)

Answered By: huon

Python optimisations in this code?

Question:

Answers:

find_best

update_weights