Python 2: Why is floor division operator faster than normal division operator?

Question:

Consider the following Python 2 code

from timeit import default_timer

def floor():
    for _ in xrange(10**7):
        1 * 12 // 39 * 2 // 39 * 23 - 234

def normal():
    for _ in xrange(10**7):
        1 * 12 / 39 * 2 / 39 * 23 - 234

t1 = default_timer()
floor()
t2 = default_timer()
normal()
t3 = default_timer()

print 'Floor  %.3f' % (t2 - t1)
print 'Normal %.3f' % (t3 - t2)

And the output, on my computer, is

Floor  0.254
Normal 1.766

So, why is the floor division operator // faster than the normal division operator / when both of them are doing the same thing?

Asked By: avamsi

||

Answers:

You can examine the compiled bytecode of a particular python function using the dis module:

def floor(): 
  12 // 39

def normal(): 
  12 / 39

>>> dis.dis(floor)
  2           0 LOAD_CONST               3 (0)
              3 POP_TOP             
              4 LOAD_CONST               0 (None)
              7 RETURN_VALUE        

>>> dis.dis(normal)
  2           0 LOAD_CONST               1 (12)
              3 LOAD_CONST               2 (39)
              6 BINARY_DIVIDE       
              7 POP_TOP             
              8 LOAD_CONST               0 (None)
             11 RETURN_VALUE     
Answered By: Hamms

The Python interpreter is pre-calculating the expression inside the loop in floor, but not in normal.

Here’s the code for floor:

>>> dis.dis(floor)

  5           0 SETUP_LOOP              24 (to 27)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_CONST               9 (10000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                10 (to 26)
             16 STORE_FAST               0 (_)

  6          19 LOAD_CONST              15 (-234)
             22 POP_TOP             
             23 JUMP_ABSOLUTE           13
        >>   26 POP_BLOCK           
        >>   27 LOAD_CONST               0 (None)
             30 RETURN_VALUE        

You can see that the expression is already calculated LOAD_CONST 15 (-234).

Here’s the same for normal:

>>> dis.dis(normal)

  9           0 SETUP_LOOP              44 (to 47)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_CONST               9 (10000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                30 (to 46)
             16 STORE_FAST               0 (_)

 10          19 LOAD_CONST              10 (12)
             22 LOAD_CONST               5 (39)
             25 BINARY_DIVIDE       
             26 LOAD_CONST               6 (2)
             29 BINARY_MULTIPLY     
             30 LOAD_CONST               5 (39)
             33 BINARY_DIVIDE       
             34 LOAD_CONST               7 (23)
             37 BINARY_MULTIPLY     
             38 LOAD_CONST               8 (234)
             41 BINARY_SUBTRACT     
             42 POP_TOP             
             43 JUMP_ABSOLUTE           13
        >>   46 POP_BLOCK           
        >>   47 LOAD_CONST               0 (None)
             50 RETURN_VALUE        

This time, the calculation is only partially simplified (eg: the initial 1 * is omitted), and most of the operations are performed at runtime.

It looks like Python 2.7 doesn’t do constant folding containing the ambiguous / operator (that may be integer or float division depending on its operands). Adding from __future__ import division at the top of the program causes the constant to be folded in normal just as it was in floor (although the result is different of course, since now / is float division).

normal
 10           0 SETUP_LOOP              24 (to 27)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_CONST               9 (10000000)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                10 (to 26)
             16 STORE_FAST               0 (_)

 11          19 LOAD_CONST              15 (-233.6370808678501)
             22 POP_TOP             
             23 JUMP_ABSOLUTE           13
        >>   26 POP_BLOCK           
        >>   27 LOAD_CONST               0 (None)
             30 RETURN_VALUE        

It’s not like the interpreter couldn’t do the constant folding with the default / operator, but it doesn’t. Perhaps the code was back-ported from Python 3, and it wasn’t considered important to make it work with the ambiguous division operator.

Answered By: Paul Hankin

“produce the same result” doesn’t imply “implemented the same way”.
Also note that these operator don’t always produce the same result as explained here:

Why Python’s Integer Division Floors

So performance measurement is pretty much implementation dependant.
Usually hardware floating point division takes longer than integer division.
It might be that python classic division (referred by you as normal) is implemented by hardware floating point division and truncated back into integer only in the final stage, while true division (referred by you as floored) is implemented using hardware int division which is a lot faster.

Answered By: Uri Brecher

I think I’ll start at primary school, when you were learning how to add, subtract and multiply, you could easily learn to do this by counting on your fingers, and when multiplying, you would do it by adding several times. However, when you were learning to divide, you probably ran into more annoying algorithms like long division that take multiple steps of integer dividing numbers and there factors until we were left with nothing or something with no divisors.
This is because it’s genuinely harder to divide a number than to multiply, add or subtract numbers and we often have to do multiple operations that estimate the division getting closer and closer to the true value. This algorithm will often perform many more steps after the decimal place to find the 10th, 100th etc. decimal place, and requires an operation for each position. (There are more efficient algorithms for this, but they all require more time to find more decimal places.)
Therefore, if we instead do integer division, we can halt the algorithm after it finds the value in the ones position. This means it can avoid the ‘infinite’ other decimal places making it a lot more efficient. (I used quotations around infinite as the algorithm generally has a stop point after a certain number of positions or it finds the point where values repeat endlessly as any rational number has one of these).
Halting this algorithm makes it a lot faster, there is also less information necessary to find the answer (as after the decimal place is unimportant) so it’s probably possible to find a more efficient algorithm to solve the problem.

Answered By: Jffrysith
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.