Which is faster in Python: x**.5 or math.sqrt(x)?

Question

I’ve been wondering this for some time. As the title say, which is faster, the actual function or simply raising to the half power?

UPDATE

This is not a matter of premature optimization. This is simply a question of how the underlying code actually works. What is the theory of how Python code works?

I sent Guido van Rossum an email cause I really wanted to know the differences in these methods.

My email:

There are at least 3 ways to do a square root in Python: math.sqrt, the
‘**’ operator and pow(x,.5). I’m just curious as to the differences in
the implementation of each of these. When it comes to efficiency which
is better?

His response:

pow and ** are equivalent; math.sqrt doesn’t work for complex numbers,
and links to the C sqrt() function. As to which one is
faster, I have no idea…

Asked By: Nope

||

Source

Answer 1

Most likely math.sqrt(x), because it’s optimized for square rooting.

Benchmarks will provide you the answer you are looking for.

Answered By: strager

Answer 2

How many square roots are you really performing? Are you trying to write some 3D graphics engine in Python? If not, then why go with code which is cryptic over code that is easy to read? The time difference is would be less than anybody could notice in just about any application I could forsee. I really don’t mean to put down your question, but it seems that you’re going a little too far with premature optimization.

Answered By: Kibbee

Answer 3

math.sqrt(x) is significantly faster than x**0.5.

import math
N = 1000000

%%timeit
for i in range(N):
    z=i**.5

10 loops, best of 3: 156 ms per loop

%%timeit
for i in range(N):
    z=math.sqrt(i)

10 loops, best of 3: 91.1 ms per loop

Using Python 3.6.9 (notebook).

Answered By: Claudiu

Answer 4

In these micro-benchmarks, math.sqrt will be slower, because of the slight time it takes to lookup the sqrt in the math namespace. You can improve it slightly with

 from math import sqrt

Even then though, running a few variations through timeit, show a slight (4-5%) performance advantage for x**.5

Interestingly, doing

 import math
 sqrt = math.sqrt

sped it up even more, to within 1% difference in speed, with very little statistical significance.

I will repeat Kibbee, and say that this is probably a premature optimization.

Answered By: JimB

Answer 5

first rule of optimization: don’t do it
second rule: don’t do it, yet

Here’s some timings (Python 2.5.2, Windows):

$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.445 usec per loop

$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.574 usec per loop

$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.727 usec per loop

This test shows that x**.5 is slightly faster than sqrt(x).

For the Python 3.0 the result is the opposite:

$ Python30python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
1000000 loops, best of 3: 0.803 usec per loop

$ Python30python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
1000000 loops, best of 3: 0.695 usec per loop

$ Python30python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
1000000 loops, best of 3: 0.761 usec per loop

math.sqrt(x) is always faster than x**.5 on another machine (Ubuntu, Python 2.6 and 3.1):

$ python -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.173 usec per loop
$ python -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.115 usec per loop
$ python -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.158 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "x**.5"
10000000 loops, best of 3: 0.194 usec per loop
$ python3.1 -mtimeit -s"from math import sqrt; x = 123" "sqrt(x)"
10000000 loops, best of 3: 0.123 usec per loop
$ python3.1 -mtimeit -s"import math; x = 123" "math.sqrt(x)"
10000000 loops, best of 3: 0.157 usec per loop

Answered By: jfs

Answer 6

For what it’s worth (see Jim’s answer). On my machine, running python 2.5:

PS C:> python -m timeit -n 100000 10000**.5
100000 loops, best of 3: 0.0543 usec per loop
PS C:> python -m timeit -n 100000 -s "import math" math.sqrt(10000)
100000 loops, best of 3: 0.162 usec per loop
PS C:> python -m timeit -n 100000 -s "from math import sqrt" sqrt(10000)
100000 loops, best of 3: 0.0541 usec per loop

Answered By: zdan

Answer 7

using Claudiu’s code, on my machine even with “from math import sqrt” x**.5 is faster but using psyco.full() sqrt(x) becomes much faster, at least by 200%

Answered By: Nope

Answer 8

Claudiu’s results differ from mine. I’m using Python 2.6 on Ubuntu on an old P4 2.4Ghz machine… Here’s my results:

>>> timeit1()
Took 0.564911 seconds
>>> timeit2()
Took 0.403087 seconds
>>> timeit1()
Took 0.604713 seconds
>>> timeit2()
Took 0.387749 seconds
>>> timeit1()
Took 0.587829 seconds
>>> timeit2()
Took 0.379381 seconds

sqrt is consistently faster for me… Even Codepad.org NOW seems to agree that sqrt, in the local context, is faster (http://codepad.org/6trzcM3j). Codepad seems to be running Python 2.5 presently. Perhaps they were using 2.4 or older when Claudiu first answered?

In fact, even using math.sqrt(i) in place of arg(i), I still get better times for sqrt. In this case timeit2() took between 0.53 and 0.55 seconds on my machine, which is still better than the 0.56-0.60 figures from timeit1.

I’d say, on modern Python, use math.sqrt and definitely bring it to local context, either with somevar=math.sqrt or with from math import sqrt.

Answered By: bobpaul

Answer 9

In python 2.6 the (float).__pow__() function uses the C pow() function and the math.sqrt() functions uses the C sqrt() function.

In glibc compiler the implementation of pow(x,y) is quite complex and it is well optimized for various exceptional cases. For example, calling C pow(x,0.5) simply calls the sqrt() function.

The difference in speed of using .** or math.sqrt is caused by the wrappers used around the C functions and the speed strongly depends on optimization flags/C compiler used on the system.

Edit:

Here are the results of Claudiu’s algorithm on my machine. I got different results:

zoltan@host:~$ python2.4 p.py 
Took 0.173994 seconds
Took 0.158991 seconds
zoltan@host:~$ python2.5 p.py 
Took 0.182321 seconds
Took 0.155394 seconds
zoltan@host:~$ python2.6 p.py 
Took 0.166766 seconds
Took 0.097018 seconds

Answered By: zoli2k

Answer 10

Someone commented about the “fast Newton-Raphson square root” from Quake 3… I implemented it with ctypes, but it’s super slow in comparison to the native versions. I’m going to try a few optimizations and alternate implementations.

from ctypes import c_float, c_long, byref, POINTER, cast

def sqrt(num):
 xhalf = 0.5*num
 x = c_float(num)
 i = cast(byref(x), POINTER(c_long)).contents.value
 i = c_long(0x5f375a86 - (i>>1))
 x = cast(byref(i), POINTER(c_float)).contents.value

 x = x*(1.5-xhalf*x*x)
 x = x*(1.5-xhalf*x*x)
 return x * num

Here’s another method using struct, comes out about 3.6x faster than the ctypes version, but still 1/10 the speed of C.

from struct import pack, unpack

def sqrt_struct(num):
 xhalf = 0.5*num
 i = unpack('L', pack('f', 28.0))[0]
 i = 0x5f375a86 - (i>>1)
 x = unpack('f', pack('L', i))[0]

 x = x*(1.5-xhalf*x*x)
 x = x*(1.5-xhalf*x*x)
 return x * num

Answered By: lunixbochs

Answer 11

What would be even faster is if you went into math.py and copied the function “sqrt” into your program. It takes time for your program to find math.py, then open it, find the function you are looking for, and then bring that back to your program. If that function is faster even with the “lookup” steps, then the function itself has to be awfully fast. Probably will cut your time in half. IN summary:

Go to math.py
Find the function “sqrt”
Copy it
Paste function into your program as the sqrt finder.
Time it.

Answered By: PyGuy

Answer 12

The problem SQRMINSUM I’ve solved recently requires computing square root repeatedly on a large dataset. The oldest 2 submissions in my history, before I’ve made other optimizations, differ solely by replacing **0.5 with sqrt(), thus reducing the runtime from 3.74s to 0.51s in PyPy. This is almost twice the already massive 400% improvement that Claudiu measured.

Answered By: Nadstratosfer Gonczy

Answer 13

The Pythonic thing to optimize for is readability. For this I think explicit use of the sqrt function is best. Having said that, let’s investigate performance anyway.

I updated Claudiu’s code for Python 3 and also made it impossible to optimize away the calculations (something a good Python compiler may do in the future):

from sys import version
from time import time
from math import sqrt, pi, e

print(version)

N = 1_000_000

def timeit1():
  z = N * e
  s = time()
  for n in range(N):
    z += (n * pi) ** .5 - z ** .5
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

def timeit2():
  z = N * e
  s = time()
  for n in range(N):
    z += sqrt(n * pi) - sqrt(z)
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

def timeit3(arg=sqrt):
  z = N * e
  s = time()
  for n in range(N):
    z += arg(n * pi) - arg(z)
  print (f"Took {(time() - s):.4f} seconds to calculate {z}")

timeit1()
timeit2()
timeit3()

Results vary, but a sample output is:

3.6.6 (default, Jul 19 2018, 14:25:17) 
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
Took 0.3747 seconds to calculate 3130485.5713865166
Took 0.2899 seconds to calculate 3130485.5713865166
Took 0.2635 seconds to calculate 3130485.5713865166

And a more recent output:

3.7.4 (default, Jul  9 2019, 16:48:28) 
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)]
Took 0.2583 seconds to calculate 3130485.5713865166
Took 0.1612 seconds to calculate 3130485.5713865166
Took 0.1563 seconds to calculate 3130485.5713865166

Try it yourself.

Answered By: hkBst

Answer 14

Of course, if one is dealing with literals and need a constant value, Python runtime can pre-calculate the value at compile time, if it is written with operators – no need to profile each version in this case:

In [77]: dis.dis(a)                                                                                                                       
  2           0 LOAD_CONST               1 (1.4142135623730951)
              2 RETURN_VALUE

In [78]: def a(): 
    ...:     return 2 ** 0.5 
    ...:                                                                                                                                  

In [79]: import dis                                                                                                                       

In [80]: dis.dis(a)                                                                                                                       
  2           0 LOAD_CONST               1 (1.4142135623730951)
              2 RETURN_VALUE

Answered By: jsbueno

Answer 15

Hello! I just made a Stack Exchange profile to participate in this conversation!
What I am doing might seem trivial, but hear me out once before judging:

Experiment Conditions:

Offline(no internet compiler issues)

Keeping my system state as stable as possible

In one attempt testing all 3 functions

I ran 3 loops of 5 iterations each, for each function stated in the original question. And I calculated the square root for Integers from 0 to 10^8 in each loop.

Here are the results:
Time Taken:
sqrt(x) < x**0.5 < pow(x, 0.5)

Note: By a margin of double-digit seconds, over 10^8 non-negative
integers.

Screenshot of outputs:
Outputs

My Conclusion:

I feel Guido’s email justifies these timings well.
Consider the following statements:

"math.sqrt() links to C and does not entertain complex numbers"
"** and pow() are equivalent"

We can thus imply that ** and pow() both have certain overhead costs since they both have to check in case the input passed is a complex number, even if we pass an integer. Moreover, Complex Numbers are built-ins for Python, and using Python to write Python code is tasking on the computer.

And very notably, math.sqrt() works relatively faster because neither does it have to go through the trouble of checking for Complex Number arguments, but also because it is directly connected with the C language function, which are proven to be a little faster than Python in general.

Let me know in case your opinion differs from mine in this conclusion!

Code:

import time
import math
print("x**0.5 : ")
for _ in range(5):
    start = time.time()
    for i in range(int(1e8)):
        i**0.5
    end = time.time()
    print(end-start)
print("math.sqrt(x) : ")
for _ in range(5):
    start = time.time()
    for i in range(int(1e8)):
        math.sqrt(i)
    end = time.time()
    print(end-start)
print("pow(x,0.5) : ")
for _ in range(5):
    start = time.time()
    for i in range(int(1e8)):
        pow(i,0.5)
    end = time.time()
    print(end-start)

Answered By: EmperorArthurIX