Why does np.astype('uint8') give different results on Windows versus Mac?
Question:
I have a (1000,1000,3)
shaped numpy array (dtype='float32'
) and when I cast it to dtype='uint8'
I get different results on Windows versus Mac.
Array is available here: https://www.dropbox.com/s/jrs4n2ayh86s0fn/image.npy?dl=0
On Mac
>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
167942490
On Windows
>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
323510676
Also reproduces with this array:
import numpy as np
X = np.array([
[[46410., 42585., 32640.],
[45645., 41820., 31875.],
[45390., 41310., 32130.]],
[[44880., 41055., 31110.],
[44115., 40290., 30345.],
[46410., 42330., 33150.]],
[[45390., 41310., 32130.],
[46155., 42075., 32895.],
[42840., 38760., 30090.]]], dtype=np.float32)
print(X.sum(), X.astype('uint8').sum())
Prints 1065135.0 2735
on Windows and 1065135.0 1860
on Mac.
Here are results with different OS and Python and Numpy:
Python 3.8.8 (Win) Numpy 1.22.4 => 1065135.0 2735
Python 3.10.6 (Mac) Numpy 1.24.2 => 1065135.0 2735
Python 3.7.12 (Mac) Numpy 1.21.6 => 1065135.0 1860
Answers:
This problem is due to a bad conversion causing integer overflows. Indeed, Numpy use C casts so to convert values, but converting floats outside the range 0-255 to 8-bit unsigned integers results in an undefined behaviour in C. We tried to do our best to report errors in this case without impacting performance but this is not possible in all cases. The latest versions of Numpy should fix this but the issue is still partially unsolved. See the 1.24.0 release notes, this issue and this one, as well as this PR (AFAIK, the first reference to this issue is found here).
Anyway, while the error may not be detected on your target machine, casting floating-point number outside the range 0-255 is unsafe and you should not expect a correct result. You need to adapt your code so there is no overflow in the first place. I also advise you to use at least the version 1.24.0 of Numpy so to better track such errors.
Related post: Why does numpy handle overflows inconsistently?
I have a (1000,1000,3)
shaped numpy array (dtype='float32'
) and when I cast it to dtype='uint8'
I get different results on Windows versus Mac.
Array is available here: https://www.dropbox.com/s/jrs4n2ayh86s0fn/image.npy?dl=0
On Mac
>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
167942490
On Windows
>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
323510676
Also reproduces with this array:
import numpy as np
X = np.array([
[[46410., 42585., 32640.],
[45645., 41820., 31875.],
[45390., 41310., 32130.]],
[[44880., 41055., 31110.],
[44115., 40290., 30345.],
[46410., 42330., 33150.]],
[[45390., 41310., 32130.],
[46155., 42075., 32895.],
[42840., 38760., 30090.]]], dtype=np.float32)
print(X.sum(), X.astype('uint8').sum())
Prints 1065135.0 2735
on Windows and 1065135.0 1860
on Mac.
Here are results with different OS and Python and Numpy:
Python 3.8.8 (Win) Numpy 1.22.4 => 1065135.0 2735
Python 3.10.6 (Mac) Numpy 1.24.2 => 1065135.0 2735
Python 3.7.12 (Mac) Numpy 1.21.6 => 1065135.0 1860
This problem is due to a bad conversion causing integer overflows. Indeed, Numpy use C casts so to convert values, but converting floats outside the range 0-255 to 8-bit unsigned integers results in an undefined behaviour in C. We tried to do our best to report errors in this case without impacting performance but this is not possible in all cases. The latest versions of Numpy should fix this but the issue is still partially unsolved. See the 1.24.0 release notes, this issue and this one, as well as this PR (AFAIK, the first reference to this issue is found here).
Anyway, while the error may not be detected on your target machine, casting floating-point number outside the range 0-255 is unsafe and you should not expect a correct result. You need to adapt your code so there is no overflow in the first place. I also advise you to use at least the version 1.24.0 of Numpy so to better track such errors.
Related post: Why does numpy handle overflows inconsistently?