What is the reason a Python3 loop is taking so much longer than Node.js?

Question:

First of all, some readers are negating it as a valid question. But if my goal is to check, if I have an algorithm that is O(n²) and n is 10000 or 100,000, then what kind of minimum running time should I expect, then the loop down below is totally valid.

I first wrote a JavaScript test:

const n = 10000;
const n2 = n * n;

let a = 0;

for (let i = 0; i < n2; i++) {
  a += 3.1;
  a -= 1.01;
  a -= 2.0001;
}

console.log(a);

and ran it:

$ time node try.js 
node try.js  0.31s user 0.01s system 98% cpu 0.321 total

so it took 0.32 seconds to finish.

But then I tried it in Python3, on a MacBook Air M2:

n = 10000
n2 = n * n

a = 0
for i in range(n2):
    a += 3.1
    a -= 1.01
    a -= 2.0001

print(a)

and it look 9.88 seconds:

$ time python3 try.py
python3 try.py  9.88s user 0.04s system 99% cpu 9.948 total

I don’t quite get it how come the JavaScript is 30 times faster than the Python code. I would have used xrange() in Python 2 but Python 3 doesn’t have it any more and it seems we use range() and it won’t generate a huge array because it is a generator.

Did I do something wrong or could I make it run faster (more like less than 1 second)?

Asked By: Stefanie Gauss

||

Answers:

Most (all?) modern javascript engines use just-in-time compilation. For a simple loop like

for (let i = 0; i < n2; i++) {
  a = a + 3;
  a = a - 1;
  a = a - 2;
}

The machine code produced is going to be very simple to optimize. CPython does not do JIT compilation, so the loop will take more time. This will be especially pronnounced in situations where the loop is taking up the bulk of the time compared to other operations within the loop.

In python, your best options are likely:

Use PyPy

While PyPy doesn’t have all the features of CPython, it does JIT compilation like Node. I get a runtime of 0.296 s using your sample.

Try it online!

Use Numpy:

Numpy uses C to do the heavy lifting of loops and operations, meaning it gets much better performance. Sometimes chunking your giant loop into many numpy operations gets much better performance, as with this example that runs in 0.657 s:

n = 10000

total = 0
for i in range(n):
    total += np.sum(np.full(n, 3) - np.full(n, 1) - np.full(n, 2))

print(total)

Try it online!

Use Numba

Numba lets you JIT-compile individual python functions, and runs with standard CPython.

import numba

@numba.jit
def do_test():
    n = 10000**2
    a = 0
    for i in range(n):
       a = a + 3
       a = a - 1
       a = a - 2
    return a

I don’t have a try-it-online for this, but it runs in 0.265s on my machine (not a fair comparison to the other numbers, since it’s a different machine.)

Answered By: Kaia
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.