Rounding errors in python

Question:

why the order of multiplications can impact the results? Consider the following code

a=47.215419672114173
b=-0.45000000000000007
c=-0.91006620964286644
result1=a*b*c
temp=b*c
result2=a*temp
result1==result2

We all know that result1 should be equal to result2, however we get:

result1==result2 #FALSE!

the difference is minimal

result1-result2 #3.552713678800501e-15

However, for particular applications this error can amplify so that the output of two programs doing the same computations (one using result1 and the other result2) can be completely different.

Why is this so and what can be done to address such issues in heavily numerical/scientific applications?

Thanks!

UPDATE

Good answers, but I still miss the reason why the order of multiplication matters, e.g.

temp2=a*b
result3=temp2*c
result1==result3 #True

So it seems that the compiler/interpreter sees a*b*c as (a*b)*c

Asked By: Mannaggia

||

Answers:

All programming languages lose precision when converting floating point numbers from decimal representation to binary representation. This results in inaccurate calculations (at least from a base 10 perspective, since the math is actually being done on floating point values represented in binary), including cases where order of operations changes the result. Most languages provide a datastructure to maintain base 10 precision, at the cost of performance. Look at Decimal in Python.

Edit:

In answer to your update, not exactly. Computers do things in order, so when you provide them a sequence of operations, they proceed 1 by 1 through the sequence. There’s no explicit order of operations thing going on beyond sequential command processing.

Answered By: Silas Ray

float comparison should always be done (by you) with a small epsilon like 10^-10

Answered By: Verena Haunschmid

We all know that result1 should be equal to result2, however we get:

No, we don’t all know that. In fact, they should not be equal, which is why they aren’t equal.

You seem to believe that you are working with real numbers. You aren’t – you are working with IEEE floating point representations. They don’t follow the same axioms. They aren’t the same thing.

The order of operations matters because python evaluates each expression, which results in a floating point number.

Answered By: Marcin

When you use floating point numerals in any programming language, you will lose precision. You can either:

Accomodate for the loss of precision, and adjust your equality checks accordingly, as follows:

 are_equal = (result1-result2)>0.0001

Where the 0.0001 (epsilon) is a value you set.

Or use the Decimal class provided with python, which is a bit slower.

Answered By: Lanaru

Why:
probably your machine/Python cannot handle that amount of accuracy.
See: http://en.wikipedia.org/wiki/Machine_epsilon#Approximation_using_Python

What to do:
This should help: http://packages.python.org/bigfloat/

Answered By: Justin Harris

Each multiplication results in twice as many digits (or bits) as the original numbers and needs to be rounded so that it will fit back into the space allocated for a floating point number. This rounding can potentially change the results when you rearrange the order.

Answered By: Mark Ransom

Representing numbers in computers is a big research area in computer science. It is not a problem present only in python but any programming language has this property, since by default it would be too expensive to perform ANY calculation arbitrary accurate.

The numerical stability of an algorithm reflects some of the limitations while thinking numerical algorithms. As said before, Decimal is defined as a standard to perform precise calculations in banking applications or any application that might need it. In python, there’s an implementation of this standard.

Answered By: gutes

As answered well in previous posts, this is a floating point arithmetic issue common in programming languages. You should be aware never to apply exact equality to float types.

When you have such comparisons, you can employ a function that compares based on a given tolerance (threshold). If the numbers are close enough, they should be considered equal number-wise. Something like:

def isequal_float(x1,x2, tol=10**(-8)):
    """Returns the results of floating point equality, according to a tolerance."""
    return abs(x1 - x2)<tol

will do the trick. If I’m not mistaken, the exact tolerance depends on whether the float type is single- or double-precision and this depends on the language you’re using.

Using such a function allows you to easily compare the results of calculations, for instance in numpy. Let’s take the following example for instance, where the correlation matrix is calculated for a dataset with continuous variables, using two ways: the pandas method pd.DataFrame.corr() and the numpy function np.corrcoef():

import numpy as np
import seaborn as sns 

iris = sns.load_dataset('iris')
iris.drop('species', axis = 1, inplace=True)

# calculate correlation coefficient matrices using two different methods
cor1 = iris.corr().to_numpy()
cor2 = np.corrcoef(iris.transpose())

print(cor1)
print(cor2)

The results seem similar:

[[ 1.         -0.11756978  0.87175378  0.81794113]
 [-0.11756978  1.         -0.4284401  -0.36612593]
 [ 0.87175378 -0.4284401   1.          0.96286543]
 [ 0.81794113 -0.36612593  0.96286543  1.        ]]
[[ 1.         -0.11756978  0.87175378  0.81794113]
 [-0.11756978  1.         -0.4284401  -0.36612593]
 [ 0.87175378 -0.4284401   1.          0.96286543]
 [ 0.81794113 -0.36612593  0.96286543  1.        ]]

but the results of their exact equality are not. These operators:

print(cor1 == cor2)
print(np.equal(cor1, cor2))

will yield mostly False results element-wise:

[[ True False False False]
 [False False False False]
 [False False False False]
 [False False False  True]]

Likewise, np.array_equal(cor1, cor2) will also yield False. However, the custom-made function gives the comparison you want:

out = [isequal_float(i,j) for i,j in zip(cor1.reshape(16, ), cor2.reshape(16, ))]
print(out)

[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]

Note: numpy includes the .allclose() function to perform floating point element-wise comparisons in numpy arrays.

print(np.allclose(cor1, cor2))
>>>True
Answered By: dbouz

Some great answers here about how to deal with floating point arithmetic. But you seem to be asking more specifically why a*b*c != b*c*a [result1 != result2]. The answer is simple: floating point arithmetic is not guaranteed to be associative.

When you assigned temp = b*c the computer already made an imprecise calculation (because it truncated) and the error propagated to result2 = a*temp.
On the other hand when you calculated Result1 = a*b*c the error started with the intermediate result a*b and propagated to *b. That’s why for e.g. if you sum the number 0.11 10k times you will have a lot more imprecision than if multiplied 0.11 * 10k, because the error propagates over many operations.

If you want some more in depth knowledge about this topic, you can read the Python doc about floating points, the article What Every Computer Scientist Should Know About Floating-Point Arithmetic or any introduction to numerical analysis/methods available on many courses/books.

Answered By: Shimada_R.