Is there an efficient way to determine if a sum of floats will be order invariant?

Question:

Due to precision limitations in floating point numbers, the order in which numbers are summed can affect the result.

>>> 0.3 + 0.4 + 2.8
3.5
>>> 2.8 + 0.4 + 0.3
3.4999999999999996

This small error can become a bigger problem if the results are then rounded.

>>> round(0.3 + 0.4 + 2.8)
4
>>> round(2.8 + 0.4 + 0.3)
3

I would like to generate a list of random floats such that their rounded sum does not depend on the order in which the numbers are summed. My current brute force approach is O(n!). Is there a more efficient method?

import random
import itertools
import math


def gen_sum_safe_seq(func, length: int, precision: int) -> list[float]:
    """
    Return a list of floats that has the same sum when rounded to the given
    precision regardless of the order in which its values are summed.
    """
    invalid = True
    while invalid:
        invalid = False
        nums = [func() for _ in range(length)]
        first_sum = round(sum(nums), precision)
        for p in itertools.permutations(nums):
            if round(sum(p), precision) != first_sum:
                invalid = True
                print(f"rejected {nums}")
                break

    return nums

for _ in range(3):
    nums = gen_sum_safe_seq(
        func=lambda :round(random.gauss(3, 0.5), 3),
        length=10,
        precision=2,
    )
    print(f"{nums} sum={sum(nums)}")

For context, as part of a programming exercise I’m providing a list of floats that model a measured value over time to ~1000 entry-level programming students. They will sum them in a variety of ways. Provided that their code is correct, I’d like for them all to get the same result to simplify checking their code. I do not want to introduce the complexities of floating point representation to students at this level.

Asked By: John Cole

||

Answers:

You are asking for a
numeric error analysis;
there is a rich literature on this.
In your example you found the
relative error
was unacceptably large.
Plus, you’re cramming infinite repeating fractions
into a 53-bit mantissa, with predictable truncation issues.
Adding numbers of different magnitudes tends to cause trouble.
Here, 2.8 is more than 8x 0.3, so we risk losing three bits of precision.

You’re making this problem much too hard.
Simply use decimal values.
Or do the equivalent: scale your random floats by
some large number, perhaps 1e9, and truncate to integer.
Now you’re summing integers, with no repeating fractional digits,
so we’re back to being able to rely on commutativity and associativity.
Remember to scale sums appropriately when reporting the results.

What you want is "math".
What you’ll get is a "machine representation".
So choose a representation that is a good fit for your use case.

EDIT

Oh, the educational context is illuminating.
Avoiding negative round-off errors is crucial.

Simply add epsilon = 2 ** -50 to all those round( ..., 3)
figures being summed.
With epsilon big enough to turn even the negative rounding errors
into positive terms, for N numbers with average of mean,
your FP sum will be approximately N * mean + N * epsilon,
and then the final rounding operation trims the accumulated error.
We exploit the facts that input values are in a defined range and N is small, so lots of zero bits
separate those two terms.

The naive sum of three-digit quantities is d1 + err1 + d2 + err2 + ... + dN + errN, where the errors are + or - rounding errors that come from truncating a repeating fraction at 53 bits.
Separating them gives d1 + ... + dN + N * random_var_with_zero_mean.
I am proposing d1 + err1 + eps + d2 + err2 + eps + ... which is d1 + ... + dN + N * eps + small_random_error.
In particular, eps ensures that we only add positive errors
as we accumulate, and by "small" I mean small_random_error < N * eps.

from itertools import permutations

eps = 2 ** -50  # -52 suffices, but -50 is more obvious during debugging
nums = [round(random.gauss(3, 0.5), 3) + eps  for _ in range(10)]
print(expected := round(sum(nums), 3))

for perm in permutations(nums):
    assert round(sum(perm), 3) == expected

Assume positive values, e.g. uniform draws from the unit interval.
Then standard Numeric Analysis advice is to
first order the values from smallest to largest,
and then sum them.

If your students will distribute the summation over K hosts,
they should walk the sorted values with stride of K.

If we don’t need very tight error bounds,
then histogram estimates can save a big-Oh log N factor,
or can even let you begin computations after
a "taste the prefix" operation which takes constant time.

Answered By: J_H

Not that I know of, but a practical approach is to use math.fsum() instead. While some platforms are perverse nearly beyond repair, on most platforms fsum() returns the infinitely-precise result subject to a single rounding error at the end. Which means the final result is independent of the order in which elements are given. For example,

>>> from math import fsum
>>> from itertools import permutations
>>> for p in permutations([0.3, 0.4, 2.8]):
...     print(p, fsum(p))
(0.3, 0.4, 2.8) 3.5
(0.3, 2.8, 0.4) 3.5
(0.4, 0.3, 2.8) 3.5
(0.4, 2.8, 0.3) 3.5
(2.8, 0.3, 0.4) 3.5
(2.8, 0.4, 0.3) 3.5

Python’s fsum() docs go on to point to slower ways that are more robust against perverse platform quirks.

Arguably silly

Here’s another approach: fiddle the numbers you generate, clearing enough low-order bits so that no rounding of any kind is ever needed no matter how an addition tree is arranged. I haven’t thought hard about this – it’s not worth the effort 😉 For a start, I haven’t thought about negative inputs at all.

def crunch(xs):
    from math import floor, ulp, ldexp
    if any(x < 0.0 for x in xs):
        raise ValueError("all elements must be >= 0.0")
    target_ulp = ldexp(ulp(max(xs)), len(xs).bit_length())
    return [floor(x / target_ulp) * target_ulp
            for x in xs]

Then, e.g.,

>>> xs = crunch([0.3, 0.4, 2.8])
>>> for x in xs:
...     print(x, x.hex())
0.29999999999999893 0x1.3333333333320p-2
0.3999999999999986 0x1.9999999999980p-2
2.799999999999999 0x1.6666666666664p+1

The decimal values are "a mess", because, from the hex values, you can see that the binary values reliably have enough low-order 0 bits to absorb any shifts that may be needed during a sum. The order of summation makes no difference then:

>>> for p in permutations(xs):
...     print(p, sum(p))
(0.29999999999999893, 0.3999999999999986, 2.799999999999999) 3.4999999999999964
(0.29999999999999893, 2.799999999999999, 0.3999999999999986) 3.4999999999999964
(0.3999999999999986, 0.29999999999999893, 2.799999999999999) 3.4999999999999964
(0.3999999999999986, 2.799999999999999, 0.29999999999999893) 3.4999999999999964
(2.799999999999999, 0.29999999999999893, 0.3999999999999986) 3.4999999999999964
(2.799999999999999, 0.3999999999999986, 0.29999999999999893) 3.4999999999999964

and

>>> import random, math
>>> xs = [random.random() * 1e3 for i in range(100_000)]
>>> sum(xs)
49872035.43787267
>>> math.fsum(xs) # different
49872035.43787304
>>> sum(sorted(xs, reverse=True)) # and different again
49872035.43787266
>>> ys = crunch(xs) # now fiddle the numbers
>>> sum(ys)  # and all three ways are the same
49872035.43712826
>>> math.fsum(ys)
49872035.43712826
>>> sum(sorted(ys, reverse=True))
49872035.43712826

The good news is that this is obviously linear-time in the number of inputs. The bad news is that more and more trailing bits have to be thrown away, the higher the dynamic range across the inputs, and the more inputs there are.

Answered By: Tim Peters

Faster (0.3 seconds instead of your 8 seconds for length 10, and 3.4 seconds for length 12) and considers more ways to sum (not just linear like ((a+b)+c)+d, but also divide&conquer summation like (a+b)+(c+d)).

The core part is the sums function, which computes all possible sums. First it enumerates the numbers, so it can use sets without losing duplicate numbers. Then its inner helper sums does the actual work. It tries all possible splits of the given numbers into a left subset and a right subset, computes all possible sums for each, and combines them.

import random
import itertools
import math
import functools

def sums(nums):
    @functools.cache
    def sums(nums):
        if len(nums) == 1:
            [num] = nums
            return {num[1]}
        result = set()
        for k in range(1, len(nums)):
            for left in map(frozenset, itertools.combinations(nums, k)):
                right = nums - left
                left_sums = sums(left)
                right_sums = sums(right)
                for L in left_sums:
                    for R in right_sums:
                        result.add(L + R)
        return result
    return sums(frozenset(enumerate(nums)))
    
def gen_sum_safe_seq(func, length: int, precision: int) -> list[float]:
    """
    Return a list of floats that has the same sum when rounded to the given
    precision regardless of the order in which its values are summed.
    """
    while True:
        nums = [func() for _ in range(length)]
        rounded_sums = {
            round(s, precision)
            for s in sums(nums)
        }
        if len(rounded_sums) == 1:
            return nums
        print(f"rejected {nums}")

for _ in range(3):
    nums = gen_sum_safe_seq(
        func=lambda :round(random.gauss(3, 0.5), 3),
        length=10,
        precision=2,
    )
    print(f"{nums} sum={sum(nums)}")

Attempt This Online!

Answered By: Kelly Bundy

The easiest way is to create random integers, and then divide (or multiply) them all by the same power of 2.

As long as the sum of the absolute values of the original integers fits into 52 bits, then you can add the resulting floats without any rounding errors.

Answered By: Matt Timmermans

Faster variant of my first answer, this one only checking linear summation like ((a+b)+c)+d like you do yourself, not also divide&conquer summation like (a+b)+(c+d)). Takes me 0.03 seconds for length 10 and 0.12 seconds for length 12.

import random
import itertools
import math
import functools

def sums(nums):
    @functools.cache
    def sums(nums):
        if len(nums) == 1:
            [num] = nums
            return {num[1]}
        result = set()
        for last in nums:
            before = nums - {last}
            before_sums = sums(before)
            _, R = last
            for L in before_sums:
                result.add(L + R)
        return result
    return sums(frozenset(enumerate(nums)))
    
def gen_sum_safe_seq(func, length: int, precision: int) -> list[float]:
    """
    Return a list of floats that has the same sum when rounded to the given
    precision regardless of the order in which its values are summed.
    """
    while True:
        nums = [func() for _ in range(length)]
        rounded_sums = {
            round(s, precision)
            for s in sums(nums)
        }
        if len(rounded_sums) == 1:
            return nums
        print(f"rejected {nums}")

for _ in range(3):
    nums = gen_sum_safe_seq(
        func=lambda :round(random.gauss(3, 0.5), 3),
        length=10,
        precision=2,
    )
    print(f"{nums} sum={sum(nums)}")

Attempt This Online!

Answered By: Kelly Bundy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.