Why is in my case For loop faster vs Map, Reduce and List comprehension

Question:

I wrote a simple script that test the speed and this is what I found out. Actually for loop was fastest in my case. That really suprised me, check out bellow (was calculating sum of squares). Is that because it holds list in memory or is that intended? Can anyone explain this.

from functools import reduce
import datetime


def time_it(func, numbers, *args):
    start_t = datetime.datetime.now()
    for i in range(numbers):
        func(args[0])
    print (datetime.datetime.now()-start_t)

def square_sum1(numbers):
    return reduce(lambda sum, next: sum+next**2, numbers, 0)


def square_sum2(numbers):
    a = 0
    for i in numbers:
        i = i**2
        a += i
    return a

def square_sum3(numbers):
    sqrt = lambda x: x**2
    return sum(map(sqrt, numbers))

def square_sum4(numbers):
    return(sum([i**2 for i in numbers]))


time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])

0:00:00.302000 #Reduce
0:00:00.144000 #For loop
0:00:00.318000 #Map
0:00:00.290000 #List comprehension`

Update – when I tried longer loops there are the results.

time_it(square_sum1, 100, range(1000))
time_it(square_sum2, 100, range(1000))
time_it(square_sum3, 100, range(1000))
time_it(square_sum4, 100, range(1000))

0:00:00.068992
0:00:00.062955
0:00:00.069022
0:00:00.057446
Asked By: alphiii

||

Answers:

Python function calls have overheads which make them relatively slow, so code that uses a simple expression will always be faster than code that wraps that expression in a function; it doesn’t matter whether it’s a normal def function or a lambda. For that reason, it’s best to avoid map or reduce if you are going to pass them a Python function if you can do the equivalent job with a plain expression in a for loop or a comprehension or generator expression.


There are a couple of minor optimizations that will speed up some of your functions. Don’t make unnecessary assignments. Eg,

def square_sum2a(numbers):
    a = 0
    for i in numbers:
        a += i ** 2
    return a

Also, i * i is quite a bit faster than i ** 2 because multiplication is faster than exponentiation.

As I mentioned in the comments, it’s more efficient to pass sum a generator than a list comprehension, especially if the loop is large; it probably won’t make difference with a small list of length 8, but it will be quite noticeable with large lists.

sum(i*i for i in numbers)

As Kelly Bundy mentions in the comments, the generator expression version isn’t actually faster than the equivalent list comprehension. Generator expressions are more efficient than list comps in terms of RAM use, but they’re not necessarily faster. And when the sequence length is small, the RAM usage differences are negligible, although there is also the time required to allocate & free the RAM used, which timeit ignores.

I just ran a few tests, with a larger data list. The list comp is still the winner (usually), but the speed differences are generally around 5-10%.

BTW, you shouldn’t use sum or next as variable names as that masks the built-in functions with the same names. It won’t hurt anything here, but it’s still not a good idea, and it makes your code look odd in an editor with more comprehensive syntax highlighting than the SO syntax highlighter.


Here’s a new version of your code that uses the timeit module. It does 3 repetitions of 10,000 loops each and sorts the results. As explained in the timeit docs, the important figure to look at in the series of the repetitions is the minimum one.

In a typical case, the lowest value gives a lower bound for how fast
your machine can run the given code snippet; higher values in the
result vector are typically not caused by variability in Python’s
speed, but by other processes interfering with your timing accuracy.
So the min() of the result is probably the only number you should be
interested in.

from timeit import Timer
from functools import reduce

def square_sum1(numbers):
    return reduce(lambda total, u: total + u**2, numbers, 0)

def square_sum1a(numbers):
    return reduce(lambda total, u: total + u*u, numbers, 0)

def square_sum2(numbers):
    a = 0
    for i in numbers:
        i = i**2
        a += i
    return a

def square_sum2a(numbers):
    a = 0
    for i in numbers:
        a += i * i
    return a

def square_sum3(numbers):
    sqr = lambda x: x**2
    return sum(map(sqr, numbers))

def square_sum3a(numbers):
    sqr = lambda x: x*x
    return sum(map(sqr, numbers))

def square_sum4(numbers):
    return(sum([i**2 for i in numbers]))

def square_sum4a(numbers):
    return(sum(i*i for i in numbers))

funcs = (
    square_sum1,
    square_sum1a,
    square_sum2,
    square_sum2a,
    square_sum3,
    square_sum3a,
    square_sum4,
    square_sum4a,
)

data = [1, 2, 5, 3, 1, 2, 5, 3]

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import data, ' + fname
        cmd = fname + '(data)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:14} {1}'.format(fname, result))

loops, reps = 10000, 3
time_test(loops, reps)

output

square_sum2a   [0.03815755599862314, 0.03817843700016965, 0.038571521999983815]
square_sum4a   [0.06384095800240175, 0.06462285799716483, 0.06579178199899616]
square_sum3a   [0.07395686000018031, 0.07405958899835241, 0.07463337299850537]
square_sum1a   [0.07867341000019223, 0.0788448769999377, 0.07908406700153137]
square_sum2    [0.08781023399933474, 0.08803317899946705, 0.08846573399932822]
square_sum4    [0.10260082300010254, 0.10360279499946046, 0.10415067900248687]
square_sum3    [0.12363515399920288, 0.12434166299863136, 0.1273790529994585]
square_sum1    [0.1276186039976892, 0.13786184099808452, 0.16315817699796753]

The results were obtained on an old single core 32 bit 2GHz machine running Python 3.6.0 on Linux.

Answered By: PM 2Ring

This is almost independent of the underlying programming language, as in abstractions do not come for free.

Meaning: there is always certain cost for calling methods. A stack needs to be established; control flow needs to “jump”. And when you think of lower levels, such as CPUs: probably the code for that method needs to be loaded, and so on.

In other words: when your primary requirement is hard-core number crunching, then you have to balance ease-of-use with the cost of the corresponding abstractions.

Beyond: if you focus on speed, then you should look beyond python, or at least beyond “ordinary” python. Instead you could turn to modules such as numpy.

Answered By: GhostCat