comparing python with c/fortran


I wrote the following programs to compare the speed of python with c/fortran.
To get the time used by the programs I used the “time” command. All the
programs compute the square root of xx+yy+z*z where x,y,z are floats.
I used the root square because it is one of the most time consuming parts in
scientific computing, in which I am involved.

I got the following times:

fortran  0m29.9s //
c        0m20.7s //
python  30m10.8s

According to the simple test I did I found that Python is not recommended for
scientific computing. But probably my code is very inefficient.

Do you think I could make my code more efficient just for this simple test case?


program root_square
implicit none

integer i,j
real x,y,z,r


do j=1,3000
    do i=1,1000000

end program root_square


#include "stdio.h"
#include "math.h"

int main (void)

float x=1.0,y=2.0,z=3.0,r;
int i,j;

for(j=0; j<3000; j++){
        for(i=0; i<1000000; i++) {

return 0;


#!/usr/bin/env python

from math import sqrt

x = 1.0
y = 2.0
z = 3.0

for j in range(1,3001):
  for i in range(1,1000001):
    r = sqrt(x*x+y*y+z*z)
Asked By: armando



You have not explained exactly what the goal of your measurement is, so it is very hard to answer whether or not your test code is going to adequately provide you with information to satisfy that goal. In general, benchmarks exist to tell you something very specific — you should know exactly what you’re trying to figure out by conducting the benchmark. Microbenchmarks, of the type you’re trying above, are also notorious for providing distorted answers…

Answered By: Perry

As a rule, numpy is used for scientific calculations in python. You probably should test that lib.

Answered By: Scorpil

probably you can.
There are a number of math libraries for python which probably can do the task you want quite a bit more efficient.
Since the python ranges work quite different than c loops, I would try to unroll these loops first.

Answered By: user1127914

Be aware that the calculation of r does not depend on the loop variables, so an optimizing compiler may move the calculation out of the loop, and just run the empty loop for the requested number of times; or even remove that loop completely and only do the calculation of the square root.
A real smart optimizer may notice that you’re not doing anything with the result, so the complete program may be optimized away without altering the output (i.e. nothing).

Answered By: eriktous

There are a number of things you should be aware of before you start comparing timings like that.

  1. As mentioned in another answer, it could be that the compiler optimizes the loop and the actual value away. Furthermore, even if you print the result, it could just pre-compute the square root.
  2. You are using real in Fortran and float in C, so (depending on your system of course) the compiler will probably use the sqrtf library call in fortran, while in C you use sqrt instead of sqrtf, which you should use for a float.
  3. In Python, you should use the numpy and scipy packages, they provide arrays on which you can do efficient whole-array operations, avoiding the looping in Python.
Answered By: steabert

Flawed benchmark.

If you want to time floating point arithmetic, then you should first time the loops doing nothing (or as close to nothing as you can manage). To avoid optimizing away the whole loop, make sure it is doing something like moving a single byte char from one array to another.

Then time it again with the floating point calculation and subtract the first timing to get a more accurate number.

Also, Python only has double floating point numbers so a more even test would ensure that the other languages also use floating point. And as others have mentioned, Python is widely used for scientific computing but those scientists generally use the numpy library to do matrix calculations rather than writing Python loops.

Answered By: Michael Dillon

for calculations i might try haskell or ml…

try this code in ML:

fun trip(x,y,z) = if y=z then 0
    else trip(((Math.sqrt((1.0*1.0)+(2.0*2.0)+(3.0*3.0)))*1.0),(y+1),z);
Answered By: Boaz Tirosh

I have recently done a similar test with a more realistic real-world algorithm. It involves numpy, Matlab, FORTRAN and C# (via ILNumerics). Without specific optimizations, numpy appears to generate much less efficient code than the others. Of course – as always – this can only suggest a general trend. You will be able to write FORTRAN code which at the end runs slower than a corresponding numpy implementation. But most the time, numpy will be much slower. Here the (averaged) results of my test:

kmeans comparison results

In order to time such simple floating point operations as in your example, all comes down to the compilers ability to generate ‘optimal’ machine instructions. Here, it is not so important, how many compilation steps are involved. .NET and numpy utilize more than one step by first compiling to byte code which than executes in a virtual machine. But the options to optimize the result does equally exist – in theory. In praxis, modern FORTRAN and C compiler are better in optimizing for execution speed. As one example they utilize floating point extensions (SSE, AVX) and do better loop unrolling. numpy (or better CPython, which is mostly used by numpy) seems to perform worse at this point. If you want to ensure, which framework is best for your task, you may attach to a debugger and investigate the final machine instructions of the executable.

However, keep in mind, in a more realistic scenario the floating point performance is only important at the very end of a large optimization chain. The difference is often masked by a much stronger effect: memory bandwith. As soon as you start handling arrays (wich is common in most scientific applications) you will have to take the cost of memory management into account. Frameworks deviate in supporting the algorithm author in writing memory efficient algorithms. In my opinion numpy makes it harder to write memory efficient algorithms then FORTRAN or C. But it is not easy in any of thoses languages. (ILNumerics improves this considerably.)

Another important point is parallelization. Does the framework supports you in executing your computations in parallel? And how efficient is it done? Again my personal opinion: neither C nor FORTRAN nor numpy make it easy to parallelize your algorithms. But FORTRAN and C at least give you the chance to do so, even if it sometimes require to use special compilers. Other frameworks (ILNumerics, Matlab) do parallelize automatically.

If you are in need of ‘peak performance’ for very small but costly algorithms you will mostly better off using FORTRAN or C. Just because they at the end generate better machine code (on a uniprocessor system). However, writing larger algorithms in C or FORTRAN and taking memory efficiency and parallelism into account often gets cumbersome. Here, higher level languages (like numpy, ILNumerics or Matlab) outdo lower level languages. And if done right – the difference in execution speed often is negligible. Unfortunately, this is often not true for the case of numpy.

Answered By: Haymo Kutschbach
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.