Numpy overflow in short scalars

Question:

I have a large .wav file array (200k samples) loaded in with scipy.io.wavfile. I tried to make a histogram of the data using matplotlib.pyplot hist with auto binning. It returned the error:

ValueError: Number of samples, -72, must be non-negative.

So I decided to set the bins myself using binwidth=1000:

min_bin = np.min(data[peaks])
max_bin = np.max(data[peaks])
plt.hist(data[peaks], bins=np.arange(min_bin,max_bin, binwidth))

When I do this, it gives the error:

RuntimeWarning: overflow encountered in short_scalars
from scipy.io import wavfile

Here are the type print outs of min_bin, max_bin, data:

Type min_bin: <class 'numpy.int16'> max_bin: <class 'numpy.int16'>
min_bin: -21231 max_bin: 32444
Type data <class 'numpy.ndarray'>

The problem seems to be with np.arange which fails when I provide it the bin range from the np.max and np.min .wav array values. When I manually type the max and min integer values into np.arange it has no problem. My hypothesis is that it is some sort of addressing error when referencing the .wav array but not sure how to fix it or why it is occurring.

Asked By: pproctor

||

Answers:

As part of the computation of the length of the array, numpy.arange calculates stop - start, in Python object arithmetic. When stop and start are numpy.int16(32444) and numpy.int16(-21231), this subtraction overflows and produces numpy.int16(-11861). This is where the warning comes from. The nonsense value leads numpy.arange to believe that the result should be a length-0 array.

The workaround is simple; just convert the arguments to ints first. The dtype of the array itself can still be set to np.int16 to save space, since that’s all you need to store the necessary data.

min_bin = int(np.min(data[peaks]))
max_bin = int(np.max(data[peaks]))
plt.hist(data[peaks], bins=np.arange(min_bin, max_bin, binwidth, dtype=np.int16))
Answered By: nog642
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.