Pandas Dataframe Comparison and Floating Point Precision

Question:

I’m looking to compare two dataframes which should be identical. However due to floating point precision I am being told the values don’t match. I have created an example to simulate it below. How can I get the correct result so the final comparison dataframe returns true for both cells?

a = pd.DataFrame({'A':[100,97.35000000001]})
b = pd.DataFrame({'A':[100,97.34999999999]})
print a

   A  
0  100.00  
1   97.35  

print b

   A  
0  100.00  
1   97.35  

print (a == b)

   A  
0  True  
1  False  
Asked By: PH82

||

Answers:

OK you can use np.isclose for this:

In [250]:
np.isclose(a,b)

Out[250]:
array([[ True],
       [ True]], dtype=bool)

np.isclose takes relative tolerance and absolute tolerance. These have default values: rtol=1e-05, atol=1e-08 respectively

Answered By: EdChum

You can use Pandas built-in assert_frame_equal, that automagically performs the numpy isclose() for floating point columns. The advantage is that you can pass an entire dataframe with mixed column types.

For fine tuning see arguments rtol and atol.

from pandas.testing import assert_frame_equal

assert_frame_equal(df1, df2)

Answered By: Edward Gaere
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.