Floating point math in different programming languages

Question:

I know that floating point math can be ugly at best but I am wondering if somebody can explain the following quirk. In most of the programing languages I tested the addition of 0.4 to 0.2 gave a slight error, where as 0.4 + 0.1 + 0.1 gave non.

What is the reason for the inequality of both calculation and what measures can one undertake in the respective programing languages to obtain correct results.

In python2/3

.4 + .2
0.6000000000000001
.4 + .1 + .1
0.6

The same happens in Julia 0.3

julia> .4 + .2
0.6000000000000001

julia> .4 + .1 + .1
0.6

and Scala:

scala> 0.4 + 0.2
res0: Double = 0.6000000000000001

scala> 0.4 + 0.1 + 0.1
res1: Double = 0.6

and Haskell:

Prelude> 0.4 + 0.2
0.6000000000000001    
Prelude> 0.4 + 0.1 + 0.1
0.6

but R v3 gets it right:

> .4 + .2
[1] 0.6
> .4 + .1 + .1
[1] 0.6
Asked By: vchuravy

||

Answers:

The reason is because it’s being rounded up at the end according to the IEEE Standard for Floating-Point Arithmetic :

http://en.wikipedia.org/wiki/IEEE_754

According to the standard: addition, multiplication, and division should be completely correct all the way up to the last bit. This is because a computer has a finite amount of space to represent these values and cannot infinitely trail the precision.

Answered By: Nowayz

All these languages are using the system-provided floating-point format, which represents values in binary rather than in decimal. Values like 0.2 and 0.4 can’t be represented exactly in that format, so instead the closest representable value is stored, resulting in a small error. For example, the numeric literal 0.2 results in a floating-point number whose exact value is 0.200000000000000011102230246251565404236316680908203125. Similarly, any given arithmetic operation on floating-point numbers may result in a value that’s not exactly representable, so the true mathematical result is replaced with the closest representable value. These are the fundamental reasons for the errors you’re seeing.

However, this doesn’t explain the differences between languages: in all of your examples, the exact same computations are being made and the exact same results are being arrived at. The difference then lies in the way that the various languages choose to display the results.

Strictly speaking, none of the answers you show is correct. Making the (fairly safe) assumption of IEEE 754 binary 64 arithmetic with a round-to-nearest rounding mode, the exact value of the first sum is:

0.600000000000000088817841970012523233890533447265625

while the exact value of the second sum is:

0.59999999999999997779553950749686919152736663818359375

However, neither of those outputs is particularly user-friendly, and clearly all of the languages you tested made the sensible decision to abbreviate the output when printing. However, they don’t all adopt the same strategy for formatting the output, which is why you’re seeing differences.

There are many possible strategies for formatting, but three particularly common ones are:

  1. Compute and display 17 correctly-rounded significant digits, possibly stripping trailing zeros where they appear. The output of 17 digits guarantees that distinct binary64 floats will have distinct representations, so that a floating-point value can be unambiguously recovered from its representation; 17 is the smallest integer with this property. This is the strategy that Python 2.6 uses, for example.

  2. Compute and display the shortest decimal string that rounds back to the given binary64 value under the usual round-ties-to-even rounding mode. This is rather more complicated to implement than strategy 1, but preserves the property that distinct floats have distinct representations, and tends to make for pleasanter output. This appears to be the strategy that all of the languages you tested (besides R) are using.

  3. Compute and display 15 (or fewer) correctly-rounded significant digits. This has the effect of hiding the errors involved in the decimal-to-binary conversions, giving the illusion of exact decimal arithmetic. It has the drawback that distinct floats can have the same representation. This appears to be what R is doing. (Thanks to @hadley for pointing out in the comments that there’s an R setting which controls the number of digits used for display; the default is to use 7 significant digits.)

Answered By: Mark Dickinson

You should be aware that 0.6 cannot be exactly represented in IEEE floating point, and neither can 0.4, 0.2, and 0.1. This is because the ratio 1/5 is an infinitely repeating fraction in binary, just like ratios such as 1/3 and 1/7 are in decimal. Since none of your initial constants is exact, it is not surprising that your results are not exact, either. (Note: if you want to get a better handle on this lack of exactness, try subtracting the value you expect from your computed results…)

There are a number of other potential gotchas in the same vein. For instance, floating point arithmetic is only approximately associative: adding the same set of numbers together in different orders will usually give you slightly different results (and occasionally can give you very different results). So, in cases where precision is important, you should be careful about how you accumulate floating point values.

The usual advice for this situation is to read “What Every Computer Scientist Should Know About Floating Point Arithmetic”, by David Goldberg. The gist: floating point is not exact, and naive assumptions about its behavior may not be supported.

Answered By: comingstorm