Does Python optimize function calls from loops?

Question:

Say, I have a code which calls some function millions time from loop and I want the code to be fast:

def outer_function(file):
    for line in file:
        inner_function(line)

def inner_function(line):
    # do something
    pass

It’s not necessarily a file processing, it could be for example a function drawing point called from function drawing line. The idea is that logically these two have to be separated, but from performance point of view they should act together as fast as possible.

Does Python detects and optimizes such things automatically? If not – is there a way to give it a clue to do so? Use some additional external optimizer maybe?…

Asked By: lithuak

||

Answers:

Calling a function to invoke the pass statement obviously carries a fairly high (∞) overhead. Whether your real program suffers undue overhead depends on the size of the inner function. If it really is just setting a pixel, then I’d suggest a different approach that uses drawing primitives coded in a native language like C or C++.

There are (somewhat experimental) JIT compilers for Python that will optimise function calls, but mainstream Python won’t do this.

Answered By: Marcelo Cantos

If by “Python” you mean CPython, the generally used implementation, no.

If by “Python” you happened to mean any implementation of the Python language, yes. PyPy can optimise a lot and I believe its method JIT should take care of cases like this.

Answered By: Chris Morgan

Python does not inline function calls, because of its dynamic nature. Theoretically, inner_function can do something that re-binds the name inner_function to something else – Python has no way to know at compile time this might happen. For example:

def func1():
    global inner_func
    inner_func = func2
    print 1

def func2():
    print 2

inner_func = func1

for i in range(5):
    inner_func()

Prints:

1
2
2
2
2

You may think this is horrible. Then, think again – Python’s functional and dynamic nature is one of its most appealing features. A lot of what Python allows comes at the cost of performance, and in most cases this is acceptable.

That said, you can probably hack something together using a tool like byteplay or similar – disassemble the inner function into bytecode and insert it into the outer function, then reassemble. On second thought, if your code is performance-critical enough to warrant such hacks, just rewrite it in C. Python has great options for FFI.


This is all relevant to the official CPython implementation. A runtime-JITting interpreter (like PyPy or the sadly defunct Unladen Swallow) can in theory detect the normal case and perform inlining. Alas, I’m not familiar enough with PyPy to know whether it does this, but it definitely can.

Answered By: Eli Bendersky

Which Python? PyPy’s JIT-compiler will – after a few hundred or dozen (depends on how many opcodes are executed on each iteration) iterations or so – start tracing execution, forget about Python function calls along the way, and compile the gathered information into a piece of optimized machine code which likely doesn’t have any remnant of the logic that made the function call itself happen. Traces are linear, the JIT’s backend doesn’t even know there was a function call, it just sees the instructions from both functions mixed together as they were executed. (This is the perfect case, when e.g. there is branching in the loop or all iterations take the same branch. Some code is unsuited to this kind of JIT-compilation and invalidates the traces quickly, before they yield much speedup, although this is rather rare.)

Now, CPython, what many people mean when they speak of “Python” or the Python interpreter, isn’t that clever. It’s a straightforward bytecode VM and will dutifully execute the logic associated with calling a function again and again in each iteration. But then again, why are you using an interpreter anyway if performance is that important? Consider writing that hot loop in native code (e.g. as a C extension or in Cython) if it’s that important to keep such overhead as low as humanly possible.

Unless you’re doing only a tiny bit of number crunching per iteration, you won’t get large improvements either way though.

Answered By: user395760

CPython (the "standard" python implementation) doesn’t do this kind of optimization.

Note however that if you are counting the CPU cycles of a function call then probably for your problem CPython is not the correct tool. If you are 100% sure that the algorithm you are going to use is already the best one (this is the most important thing), and that your computation is really CPU bound then options are for example:

  • Using PyPy instead of CPython
  • using Cython
  • Writing a C++ module and interfacing it with sip
  • If possible implement your algorithm with numpy simd approach
  • If possible move the computation on GPU hardware using for example PyCuda
Answered By: 6502