# Why is x**4.0 faster than x**4 in Python 3?

## Question:

Why is `x**4.0`

faster than `x**4`

? I am using CPython 3.5.2.

```
$ python -m timeit "for x in range(100):" " x**4.0"
10000 loops, best of 3: 24.2 usec per loop
$ python -m timeit "for x in range(100):" " x**4"
10000 loops, best of 3: 30.6 usec per loop
```

I tried changing the power I raised by to see how it acts, and for example if I raise x to the power of 10 or 16 it’s jumping from 30 to 35, but if I’m raising by **10.0** as a float, it’s just moving around 24.1~4.

I guess it has something to do with float conversion and powers of 2 maybe, but I don’t really know.

I noticed that in both cases powers of 2 are faster, I guess since those calculations are more native/easy for the interpreter/computer. But still, with floats it’s almost not moving. `2.0 => 24.1~4 & 128.0 => 24.1~4`

**but** `2 => 29 & 128 => 62`

TigerhawkT3 pointed out that it doesn’t happen outside of the loop. I checked and the situation only occurs (from what I’ve seen) when the **base** is getting raised. Any idea about that?

## Answers:

If we look at the bytecode, we can see that the expressions are purely identical. The only difference is a type of a constant that will be an argument of `BINARY_POWER`

. So it’s most certainly due to an `int`

being converted to a floating point number down the line.

```
>>> def func(n):
... return n**4
...
>>> def func1(n):
... return n**4.0
...
>>> from dis import dis
>>> dis(func)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (4)
6 BINARY_POWER
7 RETURN_VALUE
>>> dis(func1)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (4.0)
6 BINARY_POWER
7 RETURN_VALUE
```

Update: let’s take a look at Objects/abstract.c in the CPython source code:

```
PyObject *
PyNumber_Power(PyObject *v, PyObject *w, PyObject *z)
{
return ternary_op(v, w, z, NB_SLOT(nb_power), "** or pow()");
}
```

`PyNumber_Power`

calls `ternary_op`

, which is too long to paste here, so here’s the link.

It calls the `nb_power`

slot of `x`

, passing `y`

as an argument.

Finally, in `float_pow()`

at line 686 of Objects/floatobject.c we see that arguments are converted to a C `double`

right before the actual operation:

```
static PyObject *
float_pow(PyObject *v, PyObject *w, PyObject *z)
{
double iv, iw, ix;
int negate_result = 0;
if ((PyObject *)z != Py_None) {
PyErr_SetString(PyExc_TypeError, "pow() 3rd argument not "
"allowed unless all arguments are integers");
return NULL;
}
CONVERT_TO_DOUBLE(v, iv);
CONVERT_TO_DOUBLE(w, iw);
...
```

Why is

`x**4.0`

fasterthan`x**4`

in Python 3^{*}?

Python 3 `int`

objects are a full fledged object designed to support an arbitrary size; due to that fact, they are handled as such on the C level (see how all variables are declared as `PyLongObject *`

type in `long_pow`

). This also makes their exponentiation a lot more *trickier* and *tedious* since you need to play around with the `ob_digit`

array it uses to represent its value to perform it. (Source for the brave. — See: Understanding memory allocation for large integers in Python for more on `PyLongObject`

s.)

Python `float`

objects, on the contrary, *can be transformed* to a C `double`

type (by using `PyFloat_AsDouble`

) and operations can be performed using those native types. *This is great* because, after checking for relevant edge-cases, it allows Python to use the platforms’ `pow`

(C’s `pow`

, that is) to handle the actual exponentiation:

```
/* Now iv and iw are finite, iw is nonzero, and iv is
* positive and not equal to 1.0. We finally allow
* the platform pow to step in and do the rest.
*/
errno = 0;
PyFPE_START_PROTECT("pow", return NULL)
ix = pow(iv, iw);
```

where `iv`

and `iw`

are our original `PyFloatObject`

s as C `double`

s.

For what it’s worth: Python

`2.7.13`

for me is a factor`2~3`

faster, and shows the inverse behaviour.

The previous fact *also explains* the discrepancy between Python 2 and 3 so, I thought I’d address this comment too because it is interesting.

In Python 2, you’re using the old `int`

object that differs from the `int`

object in Python 3 (all `int`

objects in 3.x are of `PyLongObject`

type). In Python 2, there’s a distinction that depends on the value of the object (or, if you use the suffix `L/l`

):

```
# Python 2
type(30) # <type 'int'>
type(30L) # <type 'long'>
```

The `<type 'int'>`

you see here *does the same thing floats do*, it gets safely converted into a C

`long`

when exponentiation is performed on it (The `int_pow`

also hints the compiler to put ’em in a register if it can do so, so that *could*make a difference):

```
static PyObject *
int_pow(PyIntObject *v, PyIntObject *w, PyIntObject *z)
{
register long iv, iw, iz=0, ix, temp, prev;
/* Snipped for brevity */
```

this allows for a good speed gain.

To see how sluggish `<type 'long'>`

s are in comparison to `<type 'int'>`

s, if you wrapped the `x`

name in a `long`

call in Python 2 (essentially forcing it to use `long_pow`

as in Python 3), the speed gain disappears:

```
# <type 'int'>
(python2) ➜ python -m timeit "for x in range(1000):" " x**2"
10000 loops, best of 3: 116 usec per loop
# <type 'long'>
(python2) ➜ python -m timeit "for x in range(1000):" " long(x)**2"
100 loops, best of 3: 2.12 msec per loop
```

Take note that, though the one snippet transforms the `int`

to `long`

while the other does not (as pointed out by @pydsinger), this cast is not the contributing force behind the slowdown. The implementation of `long_pow`

is. (Time the statements solely with `long(x)`

to see).

[…] it doesn’t happen outside of the loop. […] Any idea about that?

This is CPython’s peephole optimizer folding the constants for you. You get the same exact timings either case since there’s no actual computation to find the result of the exponentiation, only loading of values:

```
dis.dis(compile('4 ** 4', '', 'exec'))
1 0 LOAD_CONST 2 (256)
3 POP_TOP
4 LOAD_CONST 1 (None)
7 RETURN_VALUE
```

Identical byte-code is generated for `'4 ** 4.'`

with the only difference being that the `LOAD_CONST`

loads the float `256.0`

instead of the int `256`

:

```
dis.dis(compile('4 ** 4.', '', 'exec'))
1 0 LOAD_CONST 3 (256.0)
2 POP_TOP
4 LOAD_CONST 2 (None)
6 RETURN_VALUE
```

So the times are identical.

^{*All of the above apply solely for CPython, the reference implementation of Python. Other implementations might perform differently.}

Because one is correct, another is approximation.

```
>>> 334453647687345435634784453567231654765 ** 4.0
1.2512490121794596e+154
>>> 334453647687345435634784453567231654765 ** 4
125124901217945966595797084130108863452053981325370920366144
719991392270482919860036990488994139314813986665699000071678
41534843695972182197917378267300625
```