The documentation for the argument in this post’s title says:
float_precision : string, default None
Specifies which converter the C engine should use for floating-point values. The options are None for the ordinary converter, high for the high-precision converter, and round_trip for the round-trip converter.
I’d like to learn more about the three algorithms mentioned, preferably without having to dig into the source code1.
Q: Do these algorithms have names I can Google for to learn exactly what they do and how they differ?
(Also, one side question: what exactly is "the C engine" in this context? Is that a Pandas-specific thing, or a Python-wide thing? None of the above?)
1 Not being familiar with the code base in question, I expect it would take me a long time just to locate the relevant source code. But even assuming I manage to find it, my experience with this sort of algorithm is that their implementations are so highly optimized, and at such a low level, that without some high-level description it is really difficult, at least for me, to follow what’s going on.
You asked about the actual algorithms – the closest I can find is:
This is taken from a related answer, kudos to MaxU (Understanding pandas.read_csv() float parsing)
Ordinary: double_converter_nogil = xstrtod High: double_converter_nogil = precise_xstrtod Round-Trip: double_converter_withgil = round_trip
From here, you’re in C-land. You also asked why pandas uses C – critical code paths are written in Cython or C.
These options represent three different approaches to converting characters to a float. The difference is mostly in the precision. While the question did not ask for the code, the code defines the algorithm and is informative.
legacy option uses the following algorithm (which is closely related to this code: https://github.com/WarrenWeckesser/textreader/blob/master/src/xstrtod.c):
high option, uses the following:
round_trip option uses Python’s own
PyOS_string_to_double which by all measures is the most complicated. This approach guarantees compatibility with other places Python interprets strings as floats, but sets exceptions and as such must keep the GIL.
The core of
PyOS_string_to_double is the private function
_Py_dg_strtod, (which is closely based on this http://www.netlib.org/fp/dtoa.c):