float64 with pandas to_csv

Question:

I’m reading a CSV with float numbers like this:

Bob,0.085
Alice,0.005

And import into a dataframe, and write this dataframe to a new place

df = pd.read_csv(orig)
df.to_csv(pandasfile)

Now this pandasfile has:

Bob,0.085000000000000006
Alice,0.0050000000000000001

What happen? maybe I have to cast to a different type like float32 or something?

Im using pandas 0.9.0 and numpy 1.6.2.

Asked By: avances123

||

Answers:

As mentioned in the comments, it is a general floating point problem.

However you can use the float_format key word of to_csv to hide it:

df.to_csv('pandasfile.csv', float_format='%.3f')

or, if you don’t want 0.0001 to be rounded to zero:

df.to_csv('pandasfile.csv', float_format='%g')

will give you:

Bob,0.085
Alice,0.005

in your output file.

For an explanation of %g, see Format Specification Mini-Language.

Answered By: bmu

UPDATE: Answer was accurate at time of writing, and floating point precision is still not something you get by default with to_csv/read_csv (precision-performance tradeoff; defaults favor performance).

Nowadays there is the float_format argument available for pandas.DataFrame.to_csv and the float_precision argument available for pandas.from_csv.

The original is still worth reading to get a better grasp on the problem.


It was a bug in pandas, not only in “to_csv” function, but in “read_csv” too. It’s not a general floating point issue, despite it’s true that floating point arithmetic is a subject which demands some care from the programmer. This article below clarifies a bit this subject:

http://docs.python.org/2/tutorial/floatingpoint.html

A classic one-liner which shows the “problem” is …

>>> 0.1 + 0.1 + 0.1
0.30000000000000004

… which does not display 0.3 as one would expect. On the other hand, if you handle the calculation using fixed point arithmetic and only in the last step you employ floating point arithmetic, it will work as you expect. See this:

>>> (1 + 1 + 1)  * 1.0 / 10
0.3

If you desperately need to circumvent this problem, I recommend you create another CSV file which contains all figures as integers, for example multiplying by 100, 1000 or other factor which turns out to be convenient. Inside your application, read the CSV file as usual and you will get those integer figures back. Then convert those values to floating point, dividing by the same factor you multiplied before.

Answered By: Richard Gomes

I have encountered this problem and this is the solution I have found.
(I tried the other solution, but it didn’t work correctly.)

First, try to round to the desired decimals and then export to csv.

Just try the following :

df = df.astype(float).round(3)
df.to_csv('pandasfile.csv')
Answered By: Lafo_R
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.