Why does np.astype('uint8') give different results on Windows versus Mac?

Question:

I have a (1000,1000,3) shaped numpy array (dtype='float32') and when I cast it to dtype='uint8' I get different results on Windows versus Mac.

Array is available here: https://www.dropbox.com/s/jrs4n2ayh86s0fn/image.npy?dl=0

On Mac

>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
167942490

On Windows

>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
323510676

Also reproduces with this array:

import numpy as np
X = np.array([
[[46410., 42585., 32640.],
 [45645., 41820., 31875.],
 [45390., 41310., 32130.]],

[[44880., 41055., 31110.],
 [44115., 40290., 30345.],
 [46410., 42330., 33150.]],

[[45390., 41310., 32130.],
 [46155., 42075., 32895.],
 [42840., 38760., 30090.]]], dtype=np.float32)

print(X.sum(), X.astype('uint8').sum())

Prints 1065135.0 2735 on Windows and 1065135.0 1860 on Mac.

Here are results with different OS and Python and Numpy:

Python 3.8.8  (Win) Numpy 1.22.4 => 1065135.0 2735 
Python 3.10.6 (Mac) Numpy 1.24.2 => 1065135.0 2735 
Python 3.7.12 (Mac) Numpy 1.21.6 => 1065135.0 1860 
Asked By: nickponline

||

Answers:

This problem is due to a bad conversion causing integer overflows. Indeed, Numpy use C casts so to convert values, but converting floats outside the range 0-255 to 8-bit unsigned integers results in an undefined behaviour in C. We tried to do our best to report errors in this case without impacting performance but this is not possible in all cases. The latest versions of Numpy should fix this but the issue is still partially unsolved. See the 1.24.0 release notes, this issue and this one, as well as this PR (AFAIK, the first reference to this issue is found here).

Anyway, while the error may not be detected on your target machine, casting floating-point number outside the range 0-255 is unsafe and you should not expect a correct result. You need to adapt your code so there is no overflow in the first place. I also advise you to use at least the version 1.24.0 of Numpy so to better track such errors.

Related post: Why does numpy handle overflows inconsistently?

Answered By: Jérôme Richard
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.