Python ctypes's sprintf formats any float type as b'0.000000' or b'5.25662e-315'

Question

I’m experimenting with the fastest way to format a float as a string with as minimal representation as possible (no trailing 0’s, no decimal places if it can be helped, no scientific notation). I’ve decided to try Python’s ctypes module.

Based on several examples I thought this function would work, but instead it always prints b'0.000000' if using %f or b'5.25124e-315' if using %g
Code:

from ctypes import *
import msvcrt
def floatToStr3(n:float)->str:
    libc = cdll.msvcrt
    print("n in:", n)
    sb = create_string_buffer(100)
    libc.sprintf(sb, b"%g", c_float(n))
    print("sb out:", sb.value)
    return sb.value

import random
floatToStr3(random.random())
floatToStr3(random.random())
floatToStr3(random.random())
floatToStr3(random.random())
floatToStr3(random.random())
floatToStr3(random.random())

Output:

n in: 0.9164215022054657
sb out: b'5.25662e-315'
n in: 0.6366531536720886
sb out: b'5.23343e-315'
n in: 0.07371310207853521
sb out: b'5.1052e-315'
n in: 0.6353450576077702
sb out: b'5.23332e-315'
n in: 0.2839487624658935
sb out: b'5.18628e-315'
n in: 0.5540225836869241
sb out: b'5.22658e-315'

I have a strong feeling I’m just not using create_string_buffer correctly, but I don’t know what the answer is. Formatting using ints works.

Using Python 3.7.4 on Windows 10.

Asked By: tngreene

||

Source

Answer 1

Observations:

Listing [Python.Docs]: ctypes – A foreign function library for Python
Check [SO]: C function called from Python via ctypes returns incorrect value (@CristiFati’s answer) when working with CTypes functions
[Python.Docs]: Built-in Types – Numeric Types – int, float, complex states (emphasis is mine):

Floating point numbers are usually implemented using double in C

By casting the number to ctypes.c_float, it loses precision (as typically float is 4 bytes long, while double is 8), yielding values very close to 0, and hence the output (also intuited by @frost-nzcr4)
Calling sprintf directly, is definitely faster than calling any other Python conversion function. But let’s not forget that Python has many optimizations, so even if the function call by itself is faster, the overhead needed for that call to be possible (Python <=>C conversions), could be higher and in some cases the overall performance worse than using a Python solution
If we talk about speed, placing sb = create_string_buffer(100) (and others) inside the function is not very smart. Do it outside (once, at the beginning) and only make use of it in the function

Below it’s an example.

code00.py:

#!/usr/bin/env python

import ctypes as cts
import random
import sys
import timeit


c_float = cts.c_float
c_double = cts.c_double
cdll = cts.cdll
create_string_buffer = cts.create_string_buffer

swprintf = cts.windll.msvcrt.swprintf
swprintf.argtypes = (cts.c_wchar_p, cts.c_wchar_p, cts.c_double)  # !!! swprintf (and all the family functions) have varargs !!!
swprintf.restype = cts.c_int

buf = cts.create_unicode_buffer(100)


def original(f: float) -> str:
    libc_ = cdll.msvcrt
    #print("n in:", f)
    sb = create_string_buffer(100)
    libc_.sprintf(sb, b"%g", c_double(f))
    #print("sb out:", sb.value)
    return sb.value.decode()


def improved(f: float) -> str:
    swprintf(buf, "%g", f)
    return buf.value


def percent(f: float) -> str:
    return "%g" % f


def format_(f: float) -> str:
    return "{0:g}".format(f)


def f_string_default(f: float) -> str:
    return f"{f}"

def f_string_g(f: float) -> str:
    return f"{f:g}"


number_count = 3
numbers = [random.random() for _ in range(number_count)]
number = numbers[0]


def main(*argv):
    funcs = (
        original,
        improved,
        percent,
        format_,
        f_string_default,
        f_string_g,
    )

    print("Functional tests")
    for f in numbers:
        print("nNumber (default format): {0:}".format(f))
        for func in funcs:
            print("    {0:s}: {1:}".format(func.__name__, func(f)))

    print("nPerformance tests (time took by each function)")
    for func in funcs:
        t = timeit.timeit(stmt="func(number)", setup="from __main__ import number, {0:s} as func".format(func.__name__))
        print("    {0:s}: {1:}".format(func.__name__, t))


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}n".format(" ".join(elem.strip() for elem in sys.version.split("n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("nDone.")
    sys.exit(rc)

Output:

[cfati@CFATI-5510-0:e:WorkDevStackOverflowq061231308]> "e:WorkDevVEnvspy_pc064_03.07_test0Scriptspython.exe" code00.py
Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] 064bit on win32

Functional tests

Number (default format): 0.09201480511926563
    original: 0.0920148
    improved: 0.0920148
    percent: 0.0920148
    format_: 0.0920148
    f_string_default: 0.09201480511926563
    f_string_g: 0.0920148

Number (default format): 0.3778731171686579
    original: 0.377873
    improved: 0.377873
    percent: 0.377873
    format_: 0.377873
    f_string_default: 0.3778731171686579
    f_string_g: 0.377873

Number (default format): 0.8507691869686248
    original: 0.850769
    improved: 0.850769
    percent: 0.850769
    format_: 0.850769
    f_string_default: 0.8507691869686248
    f_string_g: 0.850769

Performance tests (time took by each function)
    original: 1.7038035999999999
    improved: 1.4332302
    percent: 0.25398619999999994
    format_: 0.37500920000000004
    f_string_default: 0.9683423999999996
    f_string_g: 0.33258160000000014

Done.

As seen, builtin Python alternatives perform way better than CTypes ones. What I find curious (wondering if I didn’t do something wrong), is that the f-string variant is much lower (performance-wise) than what I expected it to be (just when using default specifier – things are "a bit" different when using :g – thanks @pankaj for the tip!).
It might be interesting reading [Python]: Python Patterns – An Optimization Anecdote.

Answered By: CristiFati

Python ctypes's sprintf formats any float type as b'0.000000' or b'5.25662e-315'

Question:

Answers: