Wrong numpy mean value?
Question:
I usually work with huge simulations. Sometimes, I need to compute the center of mass of the set of particles. I noted that in many situations, the mean value returned by numpy.mean()
is wrong. I can figure out that it is due to a saturation of the accumulator. In order to avoid the problem, I can split the summation over all particles in small set of particles, but it is uncomfortable. Anybody has and idea of how to solve this problem in an elegant way?
Just for piking up your curiosity, the following example produce something similar to what I observe in my simulations:
import numpy as np
a = np.ones((1024,1024), dtype=np.float32)*30504.00005
If you check the .max
and .min
values, you get:
a.max()
=> 30504.0
a.min()
=> 30504.0
However, the mean value is:
a.mean()
=> 30687.236328125
You can figure out that something is wrong here. This is not happening when using dtype=np.float64
, so it should be nice to solve the problem for single precision.
Answers:
This isn’t a NumPy problem, it’s a floating-point issue. The same occurs in C:
float acc = 0;
for (int i = 0; i < 1024*1024; i++) {
acc += 30504.00005f;
}
acc /= (1024*1024);
printf("%fn", acc); // 30687.304688
The problem is that floating-point has limited precision; as the accumulator value grows relative to the elements being added to it, the relative precision drops.
One solution is to limit the relative growth, by constructing an adder tree. Here’s an example in C (my Python isn’t good enough…):
float sum(float *p, int n) {
if (n == 1) return *p;
for (int i = 0; i < n/2; i++) {
p[i] += p[i+n/2];
}
return sum(p, n/2);
}
float x[1024*1024];
for (int i = 0; i < 1024*1024; i++) {
x[i] = 30504.00005f;
}
float acc = sum(x, 1024*1024);
acc /= (1024*1024);
printf("%fn", acc); // 30504.000000
You can call np.mean
with a dtype
keyword argument, that specifies the type of the accumulator (which defaults to the same type as the array for floating point arrays).
So calling a.mean(dtype=np.float64)
will solve your toy example, and perhaps your issue with larger arrays.
Quick and dirty answer
assert a.ndim == 2
a.mean(axis=-1).mean()
This gives the expected result for the 1024*1024 matrix, but of course this will not be true for larger arrays…
If computing the mean will not be a bottleneck in your code I would implement myself an ad-hoc algorithm in python: details however depends on your data structure.
If computing the mean is a bottleneck, then some specialized (parallel) reduction algorithm could solve the problem.
Edit
This approach may seem silly, but will for sure mitigate the problem and is almost as efficient as .mean()
itself.
In [65]: a = np.ones((1024,1024), dtype=np.float32)*30504.00005
In [66]: a.mean()
Out[66]: 30687.236328125
In [67]: a.mean(axis=-1).mean()
Out[67]: 30504.0
In [68]: %timeit a.mean()
1000 loops, best of 3: 894 us per loop
In [69]: %timeit a.mean(axis=-1).mean()
1000 loops, best of 3: 906 us per loop
Giving a more sensible answer requires some more information on the data structures, it’s sizes, and target architeture.
You can partially remedy this by using a built-in math.fsum
, which tracks down the partial sums (the docs contain a link to an AS recipe prototype):
>>> fsum(a.ravel())/(1024*1024)
30504.0
As far as I’m aware, numpy
does not have an analog.
I usually work with huge simulations. Sometimes, I need to compute the center of mass of the set of particles. I noted that in many situations, the mean value returned by numpy.mean()
is wrong. I can figure out that it is due to a saturation of the accumulator. In order to avoid the problem, I can split the summation over all particles in small set of particles, but it is uncomfortable. Anybody has and idea of how to solve this problem in an elegant way?
Just for piking up your curiosity, the following example produce something similar to what I observe in my simulations:
import numpy as np
a = np.ones((1024,1024), dtype=np.float32)*30504.00005
If you check the .max
and .min
values, you get:
a.max()
=> 30504.0
a.min()
=> 30504.0
However, the mean value is:
a.mean()
=> 30687.236328125
You can figure out that something is wrong here. This is not happening when using dtype=np.float64
, so it should be nice to solve the problem for single precision.
This isn’t a NumPy problem, it’s a floating-point issue. The same occurs in C:
float acc = 0;
for (int i = 0; i < 1024*1024; i++) {
acc += 30504.00005f;
}
acc /= (1024*1024);
printf("%fn", acc); // 30687.304688
The problem is that floating-point has limited precision; as the accumulator value grows relative to the elements being added to it, the relative precision drops.
One solution is to limit the relative growth, by constructing an adder tree. Here’s an example in C (my Python isn’t good enough…):
float sum(float *p, int n) {
if (n == 1) return *p;
for (int i = 0; i < n/2; i++) {
p[i] += p[i+n/2];
}
return sum(p, n/2);
}
float x[1024*1024];
for (int i = 0; i < 1024*1024; i++) {
x[i] = 30504.00005f;
}
float acc = sum(x, 1024*1024);
acc /= (1024*1024);
printf("%fn", acc); // 30504.000000
You can call np.mean
with a dtype
keyword argument, that specifies the type of the accumulator (which defaults to the same type as the array for floating point arrays).
So calling a.mean(dtype=np.float64)
will solve your toy example, and perhaps your issue with larger arrays.
Quick and dirty answer
assert a.ndim == 2
a.mean(axis=-1).mean()
This gives the expected result for the 1024*1024 matrix, but of course this will not be true for larger arrays…
If computing the mean will not be a bottleneck in your code I would implement myself an ad-hoc algorithm in python: details however depends on your data structure.
If computing the mean is a bottleneck, then some specialized (parallel) reduction algorithm could solve the problem.
Edit
This approach may seem silly, but will for sure mitigate the problem and is almost as efficient as .mean()
itself.
In [65]: a = np.ones((1024,1024), dtype=np.float32)*30504.00005
In [66]: a.mean()
Out[66]: 30687.236328125
In [67]: a.mean(axis=-1).mean()
Out[67]: 30504.0
In [68]: %timeit a.mean()
1000 loops, best of 3: 894 us per loop
In [69]: %timeit a.mean(axis=-1).mean()
1000 loops, best of 3: 906 us per loop
Giving a more sensible answer requires some more information on the data structures, it’s sizes, and target architeture.
You can partially remedy this by using a built-in math.fsum
, which tracks down the partial sums (the docs contain a link to an AS recipe prototype):
>>> fsum(a.ravel())/(1024*1024)
30504.0
As far as I’m aware, numpy
does not have an analog.