How to prevent Floating-Point errors with Pandas

Question:

I have a problem with my Python code. I’m using pandas to read a Dataset and store it in a Data Frame. I’m now trying to convert ug to mg (1000ug == 1 mg) and g to mg (1000 mg == 1g).

I’m first converting the Datatype of the column to float64

df[data_column] = df[data_column].astype("float64")

After that am, I’m selecting all the rows that contain values ug and multiplying them by 0.0001 and then the rows with g multiplying them with 1000

df.loc[df[unit_colum] == "g", [data_column]] *= 1000
df.loc[df[unit_colum] == "ug", [data_column]] *= 0.001

Btw:
I know that I also can devide values in pandas but this code should at the end run in a Loop where it also converts other values like (l -> ml).

My question now is:
Is there any chance that a Floating-Point error occures and what is the best way to prevent it.

I already thought about not converting the Data Frame columns into float64 and just work with the Strings. But this isn’t my prefered way.

Asked By: Yanni2

||

Answers:

It is difficult to fully avoid floating point errors in general.

You have two major options to avoid/limit them:

  • perform your computations in the smallest available unit (here µg) as integers
  • round the values to the desired precision after conversion

Also, a tip for your conversion, rather than using multiple lines you can map the factors:

factors = {'ug': 0.001, 'g': 1000, 'mg': 1}

df['data_column'] *= df['unit_column'].map(factors)
Answered By: mozway

Going for integers in a known unit is certainly a good option with easy to understand error bounds and good performance. It’s effectively the same as using floating point with an absolute error threshold.

You can also switch to fractions. This should be done starting with the conversion from strings since it avoids all floating point effects. In particular Fraction("0.01") != Fraction(0.01) but Fraction("0.01") == Fraction("0.1") / Fraction(10)

This should work:

df[data_column] = df[data_column].map(fractions.Fraction)
df.loc[df[unit_colum] == "g", [data_column]] *= fractions.Fraction(1000)
df.loc[df[unit_colum] == "ug", [data_column]] *= fractions.Fraction(1, 1000)
Answered By: Homer512
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.