Wrong numpy mean value?

Question:

I usually work with huge simulations. Sometimes, I need to compute the center of mass of the set of particles. I noted that in many situations, the mean value returned by numpy.mean() is wrong. I can figure out that it is due to a saturation of the accumulator. In order to avoid the problem, I can split the summation over all particles in small set of particles, but it is uncomfortable. Anybody has and idea of how to solve this problem in an elegant way?

Just for piking up your curiosity, the following example produce something similar to what I observe in my simulations:

import numpy as np
a = np.ones((1024,1024), dtype=np.float32)*30504.00005

If you check the .max and .min values, you get:

a.max()
=> 30504.0
a.min()
=> 30504.0

However, the mean value is:

a.mean()
=> 30687.236328125

You can figure out that something is wrong here. This is not happening when using dtype=np.float64, so it should be nice to solve the problem for single precision.

Asked By: Alejandro

||

Answers:

This isn’t a NumPy problem, it’s a floating-point issue. The same occurs in C:

float acc = 0;
for (int i = 0; i < 1024*1024; i++) {
    acc += 30504.00005f;
}
acc /= (1024*1024);
printf("%fn", acc);  // 30687.304688

(Live demo)

The problem is that floating-point has limited precision; as the accumulator value grows relative to the elements being added to it, the relative precision drops.

One solution is to limit the relative growth, by constructing an adder tree. Here’s an example in C (my Python isn’t good enough…):

float sum(float *p, int n) {
    if (n == 1) return *p;
    for (int i = 0; i < n/2; i++) {
        p[i] += p[i+n/2];
    }
    return sum(p, n/2);
}

float x[1024*1024];
for (int i = 0; i < 1024*1024; i++) {
    x[i] = 30504.00005f;
}

float acc = sum(x, 1024*1024);

acc /= (1024*1024);
printf("%fn", acc);   // 30504.000000

(Live demo)

Answered By: Oliver Charlesworth

You can call np.mean with a dtype keyword argument, that specifies the type of the accumulator (which defaults to the same type as the array for floating point arrays).

So calling a.mean(dtype=np.float64) will solve your toy example, and perhaps your issue with larger arrays.

Answered By: Jaime

Quick and dirty answer

assert a.ndim == 2
a.mean(axis=-1).mean()

This gives the expected result for the 1024*1024 matrix, but of course this will not be true for larger arrays…

If computing the mean will not be a bottleneck in your code I would implement myself an ad-hoc algorithm in python: details however depends on your data structure.

If computing the mean is a bottleneck, then some specialized (parallel) reduction algorithm could solve the problem.

Edit

This approach may seem silly, but will for sure mitigate the problem and is almost as efficient as .mean() itself.

In [65]: a = np.ones((1024,1024), dtype=np.float32)*30504.00005

In [66]: a.mean()
Out[66]: 30687.236328125

In [67]: a.mean(axis=-1).mean()
Out[67]: 30504.0

In [68]: %timeit a.mean()
1000 loops, best of 3: 894 us per loop

In [69]: %timeit a.mean(axis=-1).mean()
1000 loops, best of 3: 906 us per loop

Giving a more sensible answer requires some more information on the data structures, it’s sizes, and target architeture.

Answered By: Stefano M

You can partially remedy this by using a built-in math.fsum, which tracks down the partial sums (the docs contain a link to an AS recipe prototype):

>>> fsum(a.ravel())/(1024*1024)
30504.0

As far as I’m aware, numpy does not have an analog.

Answered By: ev-br
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.