Excluding rightmost edge in numpy.histogram

Question:

I have a list of numbers a and a list of bins which I shall use to bin the numbers in a using numpy.histogram. the bins are calculated from the mean and standard deviation (std) of a. So the number of bins is B, and the minimum value of the first bin is mean - std, the maximum of the last bin being mean + std. (The text in bold indicates my final goal)

An example goes like the following:

>>> a
array([1, 1, 3, 2, 2, 6])

>>> bins = np.linspace(mean - std, mean + std, B + 1)
array([ 0.79217487,  1.93072496,  3.06927504,  4.20782513]))

>>> numpy.histogram(a, bins = bins)[0]
(array([2, 3, 0], dtype=int32)

However, I want to exclude the rightmost edge of the last bin – i.e. if some value in a exactly equals mean + std, I do not wish to include it in the last bin. The caricature about mean and std is not important, excluding the rightmost edge (aka making it a half-open interval) is. The doc says, unfortunately in this regard:

All but the last (righthand-most) bin is half-open. In other words, if
bins is:

[1, 2, 3, 4] then the first bin is [1, 2) (including 1, but excluding
2) and the second [2, 3). The last bin, however, is [3, 4], which
includes 4.

Is there a simple solution I can employ? That is, one that does not involve manually fixing edges. That is something I can do, but that’s not what I’m looking for. Is there a flag I can pass or a different method I can use?

Asked By: amzon-ex

||

Answers:

Here’s one (kind of crude?) way to turn the make the last bin half-open instead of closed. What I’m doing is subtracting the smallest possible value from the right side of the right-most bin:

a = np.array([1, 1, 3, 2, 2, 6])
B = 3 # (in this example) 
bins = np.linspace(a.mean() - a.std(), a.mean() + a.std(), B + 1)
# array([ 0.79217487,  1.93072496,  3.06927504,  4.20782513]))
bins[-1] -= np.finfo(float).eps # <== this is the crucial line
np.histogram(a, bins = bins)

If you’re using some other type other than float for the values in a, using a different type in the call to finfo. For example:

np.finfo(float).eps
np.finfo(np.float128).eps
Answered By: Roy2012

Clip the array first. Do NOT use numpy.clip() function. it would just set out-bounded data to clip high/low value and counted into left bin and right bin. that would create high peaks show on both ends

Following code worked with me. My case is integer array, I guess should be ok with Float array.

clip_low  = a.mean() - a.std()                 # I converted clip to int 
clip_high = a.mean() + a.std()                 # should be ok with float
clip= a[ (clip_low <= a) & (a < clip_high) ]   # != clip_high (Do NOT use np.clip() fuxntion
bins= clip_high - clip_low                     # use your bins # 
hist, bins_edge= np.histogram(  clip, bins=bins, range=(clip_low,clip_high))   
Answered By: Dean Liu
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.