How can node.js be faster than c and java? Benchmark comparing node.js, c, java and python

Question:

I made a very simple benchmarking program that calculates all the prime numbers up to 10,000,000 in 4 different languages.

  • (2.97 seconds) – node.js (javascript) (4.4.5)
  • (6.96 seconds) – c (c99)
  • (6.91 seconds) – java (1.7)
  • (45.5 seconds) – python (2.7)

above is average of 3 runs each, user time

Node.js runs by far the fastest. This is confusing to me for two reasons:

  1. javascript always uses double precision floats for variables while c and java are using (long) integers in this case. Math with integers should be faster.
  2. javascript is often referred to as interpreted when actually it is a just in time compiled language. But even so how can the JIT compiler be faster than a fully compiled language?
    The python code runs the slowest which is no surprise, but why isn’t the node.js code running at a similar speed to the python?

I compiled the c code with -O2 optimization, but I tried it with all levels of optimization and it didn’t make a noticeable difference.

countPrime.js

"use strict";

var isPrime = function(n){
    //if (n !== parseInt(n,10)) {return false};
    if (n < 2) {return false};
    if (n === 2) {return true};
    if (n === 3) {return true};
    if (n % 2 === 0) {return false};
    if (n % 3 === 0) {return false};
    if (n % 1) {return false};
    var sqrtOfN = Math.sqrt(n);
    for (var i = 5; i <= sqrtOfN; i += 6){
        if (n % i === 0) {return false}
        if (n % (i + 2) === 0) {return false}
    }
    return true;
};

var countPrime = function(){
    var count = 0;
    for (let i = 1; i < 10000000;i++){
        if (isPrime(i)){
            count++;
        }
    }
    console.log('total',count);
};

countPrime();

node.js results

$ time node primeCalc.js
total 664579

real    0m2.965s
user    0m2.928s
sys     0m0.016s

$ node --version
v4.4.5

primeCalc.c

#include <stdio.h>
#include <math.h>

#define true 1
#define false 0

int isPrime (register long n){
    if (n < 2)      return false;
    if (n == 2)     return true;
    if (n == 3)     return true;
    if (n % 2 == 0) return false;
    if (n % 3 == 0) return false;
    if (n % 1)      return false;
    double sqrtOfN = sqrt(n);
    for (long i = 5; i <= sqrtOfN; i += 6){
        if (n % i == 0) return false;
        if (n % (i + 2) == 0) return false;
    }
    return true;
};

int main(int argc, const char * argv[]) {
    register long count = 0;
    for (register long i = 0; i < 10000000; i++){
        if (isPrime(i)){
            count++;
        }
    }

    printf("total %lin",count);
    return 0;
}

c results

$ gcc primeCalc.c -lm -g -O2 -std=c99 -Wall
$ time ./a.out
total 664579
real    0m6.718s
user    0m6.668s
sys     0m0.008s

PrimeCalc.java

public class PrimeCalc {

  public static void main(String[] args) {
     long count = 0;
     for (long i = 0; i < 10000000; i++){
        if (isPrime(i)){
           count++;
        }
     }
     System.out.println("total "+count);
  }


  public static boolean isPrime(long n) {
     if (n < 2)      return false;
     if (n == 2)     return true;
     if (n == 3)     return true;
     if (n % 2 == 0) return false;
     if (n % 3 == 0) return false;
     if (n % 1 > 0)  return false;
     double sqrtOfN = Math.sqrt(n);
     for (long i = 5; i <= sqrtOfN; i += 6){
        if (n % i == 0) return false;
        if (n % (i + 2) == 0) return false;
     }
     return true;
  };

}

java results

 $ javac PrimeCalc.java 
 $ time java PrimeCalc
 total 664579
 real    0m7.197s
 user    0m7.036s
 sys     0m0.040s
 $ java -version
 java version "1.7.0_111"
 OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
 OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)

primeCalc.py

import math

def isPrime (n):
    if n < 2       : return False
    if n == 2      : return True
    if n == 3      : return True
    if n % 2 == 0  : return False
    if n % 3 == 0  : return False
    if n % 1 >0    : return False
    sqrtOfN = int(math.sqrt(n)) + 1
    for i in xrange (5, sqrtOfN, 6):
        if n % i == 0       : return False;
        if n % (i + 2) == 0 : return False;
    return True

count = 0;
for i in xrange(10000000) :
    if isPrime(i) :
        count+=1

print "total ",count

python results

time python primeCalc.py
total  664579
real    0m46.588s
user    0m45.732s
sys     0m0.156s 
$ python --version
Python 2.7.6 

linux

$ uname -a
Linux hoarfrost-node_6-3667558 4.2.0-c9 #1 SMP Wed Sep 30 16:14:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

additional c run times (addendum)

  • (7.81 s) no optimization, gcc primeCalc.c -lm -std=c99 -Wall
  • (8.13 s) optimization 0, gcc primeCalc.c -lm -O0 -std=c99 -Wall
  • (7.30 s) optimization 1, gcc primeCalc.c -lm -O1 -std=c99 -Wall
  • (6.66 s) optimization 2, gcc primeCalc.c -lm -O2 -std=c99 -Wall

    • average of 3 new runs each optimization level user time *

I read the post here:
Why is this NodeJS 2x faster than native C?
This code uses an example that doesn’t actually do anything significant. It’s as if the compiler can figure out the result at compile time and it doesn’t even need to execute the loop 100000000 times to come up with the answer.
If one adds another division step to the calculation the optimization is much less significant.

for (long i = 0; i < 100000000; i++) {
  d += i >> 1;    
  d = d / (i +1); // <-- New Term 
}
  • (1.88 seconds) without optimization
  • (1.53 seconds) with optimization (-O2)

Update 03/15/2017
After reading the answer from @leon I ran a few verification tests.

Test 1 – 32 Bit Beaglebone Black, 664,579 primes up to 10,000,000

Unedited calcPrime.js and calcPrime.c running on the Beaglebone black which has a 32 bit processor.

  • C code = 62 seconds (gcc, long datatype)
  • JS code = 102 seconds (node v4)

Test 2 – 64 Bit Macbook Pro, 664,579 primes up to 10,000,000

Replace long datatypes in calcPrime.c code with uint32_t and running on my MacBook pro which has a 64 bit processor.

  • C code = 5.73 seconds (clang, long datatype)
  • C code = 2.43 seconds (clang, uint_32_t datatype)
  • JS code = 2.12 seconds (node v4)

Test 3 – 64 Bit Macbook Pro, 91,836 primes (i=1; i < 8,000,000,000; i+=10000)

Use unsigned long datatypes in C code, force javascript to use some 64 bit.
– C code = 20.4 seconds (clang, long datatype)
– JS code = 17.8 seconds (node v4)

Test 4 – 64 Bit Macbook Pro, 86,277 primes (i = 8,000,00,001; i < 16,000,000,000; i+=10000)

Use unsigned long datatypes in C code, force javascript to use all 64 bit.
– C code = 35.8 seconds (clang, long datatype)
– JS code = 34.1 seconds (node v4)

Test 5 – Cloud9 64-Bit Linux, (i = 0; i < 10000000; i++)

language    datatype    time    % to C
javascript  auto        3.22      31%
C           long        7.95     224%
C           int         2.46       0%
Java        long        8.08     229%
Java        int         2.15     -12%
Python      auto       48.43    1872%
Pypy        auto        9.51     287%

Test 6 – Cloud9 64-Bit Linux, (i = 8000000001; i < 16000000000;i+=10000)

javascript  auto       52.38      12%
C           long       46.80       0%
Java        long       49.70       6%
Python      auto      268.47     474%
Pypy        auto       56.55      21%

(All results are the average of the user seconds for three runs, time variation between runs < 10%)

Mixed Results

Changing the C and Java datatype to integer when in the range of integers significantly sped up execution. On the BBB and Cloud9 computers switching to ints made C faster than node.js. But on my Mac the node.js program still ran faster. Perhaps because the Mac is using clang and the BBB and Cloud 9 computers are using gcc. Does anyone know if gcc compiles faster programs than gcc?

When using all 64 bit integers C was a bit faster than node.js on the BBB and Cloud9 PC but a little slower on my MAC.

You can also see that pypy is about four times faster than standard python in these tests.

The fact that node.js is even compatible to C amazes me.

Asked By: Timothy Vann

||

Answers:

I spent a couple of days investigating the performance difference between JS/V8 and C, focusing first of all on the Hydrogen IR generated by the V8 engine. However, after making sure that no extraordinary optimizations are present there, I got back to the analysis of the assembly output and it struck me that the answer was a very simple one, boiling down to the couple of sentences in Jay Conrod’s blog post on internals of V8:

According to the spec, all numbers in JavaScript are 64-bit floating point doubles. We frequently work with integers though, so V8 represents numbers with 31-bit signed integers whenever possible.

The example at hand allows fitting all computations in 32 bits and node.js takes full advantage of that! The C code utilizes the long type, which on OP’s (as well as my) platform happens to be a 64-bit type. Thus, it is a 32-bit arithmetic vs 64-bit arithmetic issue, mostly due to the expensive division/remainder operation.

If long in the C code is replaced with int, then the binary produced by gcc beats node.js.

Also, if the loop is made to look for primes over a range that is outside the realm of 32-bit numbers the performance of the node.js version drops significantly.


Proof

The used source code is found further in the answer, below the results.

Counting primes less than 10 million with C and node.js

$ gcc count_primes.c -std=c99 -O3 -lm -o count_primes_long
$ sed 's/long/int/g; s/%li/%i/g' count_primes.c > count_primes_int.c
$ gcc count_primes_int.c -std=c99 -O3 -lm -o count_primes_int

# Count primes <10M using C code with (64-bit) long type
$ time ./count_primes_long 0 10000000
The range [0, 10000000) contains 664579 primes

real    0m4.394s
user    0m4.392s
sys 0m0.000s

# Count primes <10M using C code with (32-bit) int type
$ time ./count_primes_int 0 10000000
The range [0, 10000000) contains 664579 primes

real    0m1.386s
user    0m1.384s
sys 0m0.000s

# Count primes <10M using node.js/V8 which transparently does the
# job utilizing 32-bit types
$ time nodejs ./count_primes.js 0 10000000
The range [ 0 , 10000000 ) contains 664579 primes

real    0m1.828s
user    0m1.820s
sys 0m0.004s

Performance figures in the vicinity of the limit of signed 32-bit integers

Counting the primes in the range of length 100,000 starting at the number contained in the first column:

              | node.js | C (long) 
-----------------------------------
2,000,000,000 | 0.293s  | 0.639s    # fully within the 32-bit range
-----------------------------------
2,147,383,647 | 0.296s  | 0.655s    # fully within the 32-bit range
-----------------------------------
2,147,453,647 | 2.498s  | 0.646s    # 50% within the 32-bit range
-----------------------------------
2,147,483,647 | 4.717s  | 0.652s    # fully outside the 32-bit range
-----------------------------------
3,000,000,000 | 5.449s  | 0.755s    # fully outside the 32-bit range
-----------------------------------

count_primes.js

"use strict";

var isPrime = function(n){
    if (n < 2) {return false};
    if (n === 2) {return true};
    if (n === 3) {return true};
    if (n % 2 === 0) {return false};
    if (n % 3 === 0) {return false};
    var sqrtOfN = Math.sqrt(n);
    for (var i = 5; i <= sqrtOfN; i += 6){
        if (n % i === 0) {return false}
        if (n % (i + 2) === 0) {return false}
    }
    return true;
};

var countPrime = function(S, E){
    var count = 0;
    for (let i = S; i < E;i++){
        if ( isPrime(i) ) { ++count; }
    }
    return count;
};

if( process.argv.length != 4) {
    console.log('Usage: nodejs count_prime.js <range_start> <range_length>');
    process.exit();
}

var S = parseInt(process.argv[2]);
var N = parseInt(process.argv[3]);
var E = S+N;
var P = countPrime(S, E);
console.log('The range [', S, ',', E, ') contains', P, 'primes');

count_primes.c

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define true 1
#define false 0

int isPrime (register long n){
    if (n < 2)      return false;
    if (n == 2)     return true;
    if (n == 3)     return true;
    if (n % 2 == 0) return false;
    if (n % 3 == 0) return false;
    double sqrtOfN = sqrt(n);
    for (long i = 5; i <= sqrtOfN; i += 6){
        if (n % i == 0) return false;
        if (n % (i + 2) == 0) return false;
    }
    return true;
};

int main(int argc, const char * argv[]) {
    if ( argc != 3 ) {
        fprintf(stderr, "Usage: count_primes <range_start> <range_length>n");
        exit(1);
    }
    const long S = atol(argv[1]);
    const long N = atol(argv[2]);
    register long count = 0;
    for (register long i = S; i < S + N; i++){
        if ( isPrime(i) ) ++count;
    }
    printf("The range [%li, %li) contains %li primesn", S, S+N, count);
}
Answered By: Leon

When I ran your code in python with a new algorithm:

real    0m3.583s
user    0m3.297s
sys     0m0.094s

Faster than the C benchmark you have above. I think that a simpler language helps you design better algorithms, but that is my opinion.
(Also can use multiprocessing to make even faster)

def allPrimes(N):
    is_prime = [1]*N
    # We know 0 and 1 are composites
    is_prime[0] = 0
    is_prime[1] = 0

    i = 2
    # This will loop from 2 to int(sqrt(x))
    while i*i <= N:
        # If we already crossed out this number, then continue
        if is_prime[i] == 0:
            i += 1
            continue

        j = 2*i
        while j < N:
            # Cross out this as it is composite
            is_prime[j] = 0
            # j is incremented by i, because we want to cover all multiples of i
            j += i

        i += 1
    return is_prime




print("total", sum(allPrimes(10000000)))
Answered By: Randiaz
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.