Why is looping over range() in Python faster than using a while loop?
Question:
The other day I was doing some Python benchmarking and I came across something interesting. Below are two loops that do more or less the same thing. Loop 1 takes about twice as long as loop 2 to execute.
Loop 1:
int i = 0
while i < 100000000:
i += 1
Loop 2:
for n in range(0,100000000):
pass
Why is the first loop so much slower? I know it’s a trivial example but it’s piqued my interest. Is there something special about the range() function that makes it more efficient than incrementing a variable the same way?
Answers:
Because you are running more often in code written in C in the interpretor. i.e. i+=1 is in Python, so slow (comparatively), whereas range(0,…) is one C call the for loop will execute mostly in C too.
range()
is implemented in C, whereas i += 1
is interpreted.
Using xrange()
could make it even faster for large numbers. Starting with Python 3.0 range()
is the same as previously xrange()
.
see the disassembly of python byte code, you may get a more concrete idea
use while loop:
1 0 LOAD_CONST 0 (0)
3 STORE_NAME 0 (i)
2 6 SETUP_LOOP 28 (to 37)
>> 9 LOAD_NAME 0 (i) # <-
12 LOAD_CONST 1 (100000000) # <-
15 COMPARE_OP 0 (<) # <-
18 JUMP_IF_FALSE 14 (to 35) # <-
21 POP_TOP # <-
3 22 LOAD_NAME 0 (i) # <-
25 LOAD_CONST 2 (1) # <-
28 INPLACE_ADD # <-
29 STORE_NAME 0 (i) # <-
32 JUMP_ABSOLUTE 9 # <-
>> 35 POP_TOP
36 POP_BLOCK
The loop body has 10 op
use range:
1 0 SETUP_LOOP 23 (to 26)
3 LOAD_NAME 0 (range)
6 LOAD_CONST 0 (0)
9 LOAD_CONST 1 (100000000)
12 CALL_FUNCTION 2
15 GET_ITER
>> 16 FOR_ITER 6 (to 25) # <-
19 STORE_NAME 1 (n) # <-
2 22 JUMP_ABSOLUTE 16 # <-
>> 25 POP_BLOCK
>> 26 LOAD_CONST 2 (None)
29 RETURN_VALUE
The loop body has 3 op
The time to run C code is much shorter than intepretor and can be ignored.
Most of Python’s built in method calls are run as C code. Code that has to be interpreted is much slower. In terms of memory efficiency and execution speed the difference is gigantic. The python internals have been optimized to the extreme, and it’s best to take advantage of those optimizations.
It must be said that there is a lot of object creation and destruction going on with the while loop.
i += 1
is the same as:
i = i + 1
But because Python ints are immutable, it doesn’t modify the existing object; rather it creates a brand new object with a new value. It’s basically:
i = new int(i + 1) # Using C++ or Java-ish syntax
The garbage collector will also have a large amount of cleanup to do.
“Object creation is expensive”.
I think the answer here is a little more subtle than the other answers suggest, though the gist of it is correct: the for loop is faster because more of the operations happen in C and less in Python.
More specifically, in the for loop case, two things happen in C that in the while loop are handled in Python:
-
In the while loop, the comparison i < 100000000
is executed in Python, whereas in the for loop, the job is passed to the iterator of range(100000000)
, which internally does the iteration (and hence bounds check) in C.
-
In the while loop, the loop update i += 1
happens in Python, whereas in the for loop again the iterator of range(100000000)
, written in C, does the i+=1
(or ++i
).
We can see that it is a combination of both of these things that makes the for loop faster by manually adding them back to see the difference.
import timeit
N = 100000000
def while_loop():
i = 0
while i < N:
i += 1
def for_loop_pure():
for i in range(N):
pass
def for_loop_with_increment():
for i in range(N):
i += 1
def for_loop_with_test():
for i in range(N):
if i < N: pass
def for_loop_with_increment_and_test():
for i in range(N):
if i < N: pass
i += 1
def main():
print('while looptt', timeit.timeit(while_loop, number=1))
print('for purett', timeit.timeit(for_loop_pure, number=1))
print('for incttt', timeit.timeit(for_loop_with_increment, number=1))
print('for testtt', timeit.timeit(for_loop_with_test, number=1))
print('for inc+testt', timeit.timeit(for_loop_with_increment_and_test, number=1))
if __name__ == '__main__':
main()
I tried this both with the number 100000000 a literal constant and with it being a variable N
as would be more typical.
# inline constant N
while loop 3.5131139
for pure 1.3211338000000001
for inc 3.5477727000000003
for test 2.5209639
for inc+test 4.697028999999999
# variable N
while loop 4.1298240999999996
for pure 1.3526357999999998
for inc 3.6060175
for test 3.1093069
for inc+test 5.4753364
As you can see, in both cases, the while
time is very close to the difference of for inc+test
and for pure
. Note also that in the case where we use the N
variable, the while
has an additional slowdown to repeatedly lookup the value of N
, but the for
does not.
It’s really crazy that such trivial modifications can result in over 3x code speedup, but that’s Python for you. And don’t even get me started on when you can use a builtin over a loop at all….
The other day I was doing some Python benchmarking and I came across something interesting. Below are two loops that do more or less the same thing. Loop 1 takes about twice as long as loop 2 to execute.
Loop 1:
int i = 0
while i < 100000000:
i += 1
Loop 2:
for n in range(0,100000000):
pass
Why is the first loop so much slower? I know it’s a trivial example but it’s piqued my interest. Is there something special about the range() function that makes it more efficient than incrementing a variable the same way?
Because you are running more often in code written in C in the interpretor. i.e. i+=1 is in Python, so slow (comparatively), whereas range(0,…) is one C call the for loop will execute mostly in C too.
range()
is implemented in C, whereas i += 1
is interpreted.
Using xrange()
could make it even faster for large numbers. Starting with Python 3.0 range()
is the same as previously xrange()
.
see the disassembly of python byte code, you may get a more concrete idea
use while loop:
1 0 LOAD_CONST 0 (0)
3 STORE_NAME 0 (i)
2 6 SETUP_LOOP 28 (to 37)
>> 9 LOAD_NAME 0 (i) # <-
12 LOAD_CONST 1 (100000000) # <-
15 COMPARE_OP 0 (<) # <-
18 JUMP_IF_FALSE 14 (to 35) # <-
21 POP_TOP # <-
3 22 LOAD_NAME 0 (i) # <-
25 LOAD_CONST 2 (1) # <-
28 INPLACE_ADD # <-
29 STORE_NAME 0 (i) # <-
32 JUMP_ABSOLUTE 9 # <-
>> 35 POP_TOP
36 POP_BLOCK
The loop body has 10 op
use range:
1 0 SETUP_LOOP 23 (to 26)
3 LOAD_NAME 0 (range)
6 LOAD_CONST 0 (0)
9 LOAD_CONST 1 (100000000)
12 CALL_FUNCTION 2
15 GET_ITER
>> 16 FOR_ITER 6 (to 25) # <-
19 STORE_NAME 1 (n) # <-
2 22 JUMP_ABSOLUTE 16 # <-
>> 25 POP_BLOCK
>> 26 LOAD_CONST 2 (None)
29 RETURN_VALUE
The loop body has 3 op
The time to run C code is much shorter than intepretor and can be ignored.
Most of Python’s built in method calls are run as C code. Code that has to be interpreted is much slower. In terms of memory efficiency and execution speed the difference is gigantic. The python internals have been optimized to the extreme, and it’s best to take advantage of those optimizations.
It must be said that there is a lot of object creation and destruction going on with the while loop.
i += 1
is the same as:
i = i + 1
But because Python ints are immutable, it doesn’t modify the existing object; rather it creates a brand new object with a new value. It’s basically:
i = new int(i + 1) # Using C++ or Java-ish syntax
The garbage collector will also have a large amount of cleanup to do.
“Object creation is expensive”.
I think the answer here is a little more subtle than the other answers suggest, though the gist of it is correct: the for loop is faster because more of the operations happen in C and less in Python.
More specifically, in the for loop case, two things happen in C that in the while loop are handled in Python:
-
In the while loop, the comparison
i < 100000000
is executed in Python, whereas in the for loop, the job is passed to the iterator ofrange(100000000)
, which internally does the iteration (and hence bounds check) in C. -
In the while loop, the loop update
i += 1
happens in Python, whereas in the for loop again the iterator ofrange(100000000)
, written in C, does thei+=1
(or++i
).
We can see that it is a combination of both of these things that makes the for loop faster by manually adding them back to see the difference.
import timeit
N = 100000000
def while_loop():
i = 0
while i < N:
i += 1
def for_loop_pure():
for i in range(N):
pass
def for_loop_with_increment():
for i in range(N):
i += 1
def for_loop_with_test():
for i in range(N):
if i < N: pass
def for_loop_with_increment_and_test():
for i in range(N):
if i < N: pass
i += 1
def main():
print('while looptt', timeit.timeit(while_loop, number=1))
print('for purett', timeit.timeit(for_loop_pure, number=1))
print('for incttt', timeit.timeit(for_loop_with_increment, number=1))
print('for testtt', timeit.timeit(for_loop_with_test, number=1))
print('for inc+testt', timeit.timeit(for_loop_with_increment_and_test, number=1))
if __name__ == '__main__':
main()
I tried this both with the number 100000000 a literal constant and with it being a variable N
as would be more typical.
# inline constant N
while loop 3.5131139
for pure 1.3211338000000001
for inc 3.5477727000000003
for test 2.5209639
for inc+test 4.697028999999999
# variable N
while loop 4.1298240999999996
for pure 1.3526357999999998
for inc 3.6060175
for test 3.1093069
for inc+test 5.4753364
As you can see, in both cases, the while
time is very close to the difference of for inc+test
and for pure
. Note also that in the case where we use the N
variable, the while
has an additional slowdown to repeatedly lookup the value of N
, but the for
does not.
It’s really crazy that such trivial modifications can result in over 3x code speedup, but that’s Python for you. And don’t even get me started on when you can use a builtin over a loop at all….