Python equivalence to inline functions or macros

Question:

I just realized that doing

x.real*x.real+x.imag*x.imag

is three times faster than doing

abs(x)**2

where x is a numpy array of complex numbers. For code readability, I could define a function like

def abs2(x):
    return x.real*x.real+x.imag*x.imag

which is still far faster than abs(x)**2, but it is at the cost of a function call. Is it possible to inline such a function, as I would do in C using macro or using inline keyword?

Asked By: Charles Brunet

||

Answers:

No.

The closest you can get to C macros is a script (awk or other) that you may include in a makefile, and which substitutes a certain pattern like abs(x)**2 in your python scripts with the long form.

Answered By: Thaddee Tyl

Is it possible to inline such a function, as I would do in C using macro or using inline keyword?

No. Before reaching this specific instruction, Python interpreters don’t even know if there’s such a function, much less what it does.

As noted in comments, PyPy will inline automatically (the above still holds – it “simply” generates an optimized version at runtime, benefits from it, but breaks out of it when it’s invalidated), although in this specific case that doesn’t help as implementing NumPy on PyPy started only shortly ago and isn’t even beta level to this day. But the bottom line is: Don’t worry about optimizations on this level in Python. Either the implementations optimize it themselves or they don’t, it’s not your responsibility.

Answered By: user395760

Actually it might be even faster to calculate, like:

x.real** 2+ x.imag** 2

Thus, the extra cost of function call will likely to diminish. Lets see:

In []: n= 1e4
In []: x= randn(n, 1)+ 1j* rand(n, 1)
In []: %timeit x.real* x.real+ x.imag* x.imag
10000 loops, best of 3: 100 us per loop
In []: %timeit x.real** 2+ x.imag** 2
10000 loops, best of 3: 77.9 us per loop

And encapsulating the calculation in a function:

In []: def abs2(x):
   ..:     return x.real** 2+ x.imag** 2
   ..: 
In []: %timeit abs2(x)
10000 loops, best of 3: 80.1 us per loop

Anyway (as other have pointed out) this kind of micro-optimization (in order to avoid a function call) is not really productive way to write python code.

Answered By: eat

I’ll agree with everyone else that such optimizations will just cause you pain on CPython, that if you care about performance you should consider PyPy (though our NumPy may be too incomplete to be useful). However I’ll disagree and say you can care about such optimizations on PyPy, not this one specifically as has been said PyPy does that automatically, but if you know PyPy well you really can tune your code to make PyPy emit the assembly you want, not that you need to almost ever.

Answered By: Alex Gaynor

Not exactly what the OP has asked for, but close:

Inliner inlines Python function calls. Proof of concept for this
blog
post

from inliner import inline

@inline
def add_stuff(x, y):
    return x + y

def add_lots_of_numbers():
    results = []
    for i in xrange(10):
         results.append(add_stuff(i, i+1))

In the above code the add_lots_of_numbers function is converted into
this:

def add_lots_of_numbers():
    results = []
    for i in xrange(10):
         results.append(i + i + 1)

Also anyone interested in this question and the complications involved in implementing such optimizer in CPython, might also want to have a look at:

Answered By: AXO

You can try to use lambda:

abs2 = lambda x : x.real*x.real+x.imag*x.imag

then call it by:

y = abs2(x)
Answered By: Linfeng Mu