Python Dictionary vs If Statement Speed

Question:

I have found a few links talking about switch cases being faster in c++ than if else because it can be optimized in compilation. I then found some suggestions people had that using a dictionary may be faster than an If statement. However, most of the conversation are about someones work end just end up discussing that they should optimize other parts of the code first and it wont matter unless your doing millions of if else. Can anyone explain why this is?

Say I have 100 unique numbers that are going to be streamed in to a python code constantly. I want to check which number it is, then execute something. So i could either do a ton of if else, or i could put each number in a dictionary. For arguments sake, lets say its a single thread.

Does someone understand the layer between python and the low level execution that can explain how this is working?

Thanks 🙂

Asked By: user-2147482637

||

Answers:

The if/elif/else structure compares the key it was given to a sequence of possible values one by one until it finds a match in the condition of some if statement, then reads what it is supposed to execute from inside the if block. This can take a long time, because so many checks (n/2 on average, for n possible values) have to be made for every lookup.

The reason that a sequence of if statements is more difficult to optimize than a switch statement is that the condition checks (what’s inside the parens in C++) might conceivably change the state of some variable that’s involved in the next check, so you have to do them in order. The restrictions on switch statements remove that possibility, so the order doesn’t matter (I think).


Python dictionaries are implemented as hash tables. The idea is this: if you could deal with arbitrarily large numbers and had infinite RAM, you could create a huge array of function pointers that is indexed just by casting whatever your lookup value is to an integer and using that as the index. Lookup would be virtually instantaneous.

You can’t do that, of course, but you can create an array of some manageable length, pass the lookup value to a hash function (which generates some integer, depending on the lookup value), then % your result with the length of your array to get an index within the bounds of that array. That way, lookup takes as much time as is needed to call the hash function once, take the modulus, and jump to an index. If the amount of different possible lookup values is large enough, the overhead of the hash function becomes negligible compared to those n/2 condition checks.

(Actually, since many different lookup values will inevitably map to the same index, it’s not quite that simple. You have to check for and resolve possible conflicts, which can be done in a number of ways. Still, the gist of it is as described above.)

Answered By: alcedine

However, most of the conversation are about someones work end just end
up discussing that they should optimize other parts of the code first
and it wont matter unless your doing millions of if else. Can anyone
explain why this is?

Generally, you should only bother to optimize code if you really need to, i.e. if the program’s performance is unusably slow.

If this is the case, you should use a profiler to determine which parts are actually causing the most problems. For Python, the cProfile module is pretty good for this.

Does someone understand the layer between python and the low level
execution that can explain how this is working?

If you want to get an idea of how your code executes, take a look at the dis module.

A quick example…

import dis

# Here are the things we might want to do
def do_something_a():
    print 'I did a'


def do_something_b():
    print 'I did b'


def do_something_c():
    print 'I did c'


# Case 1
def f1(x):
    if x == 1:
        do_something_a()
    elif x == 2:
        do_something_b()
    elif x == 3:
        do_something_c()


# Case 2
FUNC_MAP = {1: do_something_a, 2: do_something_b, 3: do_something_c}
def f2(x):
    FUNC_MAP[x]()


# Show how the functions execute
print 'Case 1'
dis.dis(f1)
print 'nnCase 2'
dis.dis(f2)

…which outputs…

Case 1
 18           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               2 (==)
              9 POP_JUMP_IF_FALSE       22

 19          12 LOAD_GLOBAL              0 (do_something_a)
             15 CALL_FUNCTION            0
             18 POP_TOP
             19 JUMP_FORWARD            44 (to 66)

 20     >>   22 LOAD_FAST                0 (x)
             25 LOAD_CONST               2 (2)
             28 COMPARE_OP               2 (==)
             31 POP_JUMP_IF_FALSE       44

 21          34 LOAD_GLOBAL              1 (do_something_b)
             37 CALL_FUNCTION            0
             40 POP_TOP
             41 JUMP_FORWARD            22 (to 66)

 22     >>   44 LOAD_FAST                0 (x)
             47 LOAD_CONST               3 (3)
             50 COMPARE_OP               2 (==)
             53 POP_JUMP_IF_FALSE       66

 23          56 LOAD_GLOBAL              2 (do_something_c)
             59 CALL_FUNCTION            0
             62 POP_TOP
             63 JUMP_FORWARD             0 (to 66)
        >>   66 LOAD_CONST               0 (None)
             69 RETURN_VALUE


Case 2
 29           0 LOAD_GLOBAL              0 (FUNC_MAP)
              3 LOAD_FAST                0 (x)
              6 BINARY_SUBSCR
              7 CALL_FUNCTION            0
             10 POP_TOP
             11 LOAD_CONST               0 (None)
             14 RETURN_VALUE

…so it’s pretty easy to see which function has to execute the most instructions.

As for which is actually faster, that’s something you’d have to check by profiling the code.

Answered By: Aya

how to do check multiple statements

Answered By: Sheikh Rahin