Excluding rightmost edge in numpy.histogram
Question:
I have a list of numbers a
and a list of bins which I shall use to bin the numbers in a
using numpy.histogram
. the bins are calculated from the mean and standard deviation (std
) of a
. So the number of bins is B
, and the minimum value of the first bin is mean - std
, the maximum of the last bin being mean + std
. (The text in bold indicates my final goal)
An example goes like the following:
>>> a
array([1, 1, 3, 2, 2, 6])
>>> bins = np.linspace(mean - std, mean + std, B + 1)
array([ 0.79217487, 1.93072496, 3.06927504, 4.20782513]))
>>> numpy.histogram(a, bins = bins)[0]
(array([2, 3, 0], dtype=int32)
However, I want to exclude the rightmost edge of the last bin – i.e. if some value in a
exactly equals mean + std
, I do not wish to include it in the last bin. The caricature about mean
and std
is not important, excluding the rightmost edge (aka making it a half-open interval) is. The doc says, unfortunately in this regard:
All but the last (righthand-most) bin is half-open. In other words, if
bins is:
[1, 2, 3, 4] then the first bin is [1, 2) (including 1, but excluding
2) and the second [2, 3). The last bin, however, is [3, 4], which
includes 4.
Is there a simple solution I can employ? That is, one that does not involve manually fixing edges. That is something I can do, but that’s not what I’m looking for. Is there a flag I can pass or a different method I can use?
Answers:
Here’s one (kind of crude?) way to turn the make the last bin half-open instead of closed. What I’m doing is subtracting the smallest possible value from the right side of the right-most bin:
a = np.array([1, 1, 3, 2, 2, 6])
B = 3 # (in this example)
bins = np.linspace(a.mean() - a.std(), a.mean() + a.std(), B + 1)
# array([ 0.79217487, 1.93072496, 3.06927504, 4.20782513]))
bins[-1] -= np.finfo(float).eps # <== this is the crucial line
np.histogram(a, bins = bins)
If you’re using some other type other than float for the values in a
, using a different type in the call to finfo
. For example:
np.finfo(float).eps
np.finfo(np.float128).eps
Clip the array first. Do NOT use numpy.clip() function. it would just set out-bounded data to clip high/low value and counted into left bin and right bin. that would create high peaks show on both ends
Following code worked with me. My case is integer array, I guess should be ok with Float array.
clip_low = a.mean() - a.std() # I converted clip to int
clip_high = a.mean() + a.std() # should be ok with float
clip= a[ (clip_low <= a) & (a < clip_high) ] # != clip_high (Do NOT use np.clip() fuxntion
bins= clip_high - clip_low # use your bins #
hist, bins_edge= np.histogram( clip, bins=bins, range=(clip_low,clip_high))
I have a list of numbers a
and a list of bins which I shall use to bin the numbers in a
using numpy.histogram
. the bins are calculated from the mean and standard deviation (std
) of a
. So the number of bins is B
, and the minimum value of the first bin is mean - std
, the maximum of the last bin being mean + std
. (The text in bold indicates my final goal)
An example goes like the following:
>>> a
array([1, 1, 3, 2, 2, 6])
>>> bins = np.linspace(mean - std, mean + std, B + 1)
array([ 0.79217487, 1.93072496, 3.06927504, 4.20782513]))
>>> numpy.histogram(a, bins = bins)[0]
(array([2, 3, 0], dtype=int32)
However, I want to exclude the rightmost edge of the last bin – i.e. if some value in a
exactly equals mean + std
, I do not wish to include it in the last bin. The caricature about mean
and std
is not important, excluding the rightmost edge (aka making it a half-open interval) is. The doc says, unfortunately in this regard:
All but the last (righthand-most) bin is half-open. In other words, if
bins is:[1, 2, 3, 4] then the first bin is [1, 2) (including 1, but excluding
2) and the second [2, 3). The last bin, however, is [3, 4], which
includes 4.
Is there a simple solution I can employ? That is, one that does not involve manually fixing edges. That is something I can do, but that’s not what I’m looking for. Is there a flag I can pass or a different method I can use?
Here’s one (kind of crude?) way to turn the make the last bin half-open instead of closed. What I’m doing is subtracting the smallest possible value from the right side of the right-most bin:
a = np.array([1, 1, 3, 2, 2, 6])
B = 3 # (in this example)
bins = np.linspace(a.mean() - a.std(), a.mean() + a.std(), B + 1)
# array([ 0.79217487, 1.93072496, 3.06927504, 4.20782513]))
bins[-1] -= np.finfo(float).eps # <== this is the crucial line
np.histogram(a, bins = bins)
If you’re using some other type other than float for the values in a
, using a different type in the call to finfo
. For example:
np.finfo(float).eps
np.finfo(np.float128).eps
Clip the array first. Do NOT use numpy.clip() function. it would just set out-bounded data to clip high/low value and counted into left bin and right bin. that would create high peaks show on both ends
Following code worked with me. My case is integer array, I guess should be ok with Float array.
clip_low = a.mean() - a.std() # I converted clip to int
clip_high = a.mean() + a.std() # should be ok with float
clip= a[ (clip_low <= a) & (a < clip_high) ] # != clip_high (Do NOT use np.clip() fuxntion
bins= clip_high - clip_low # use your bins #
hist, bins_edge= np.histogram( clip, bins=bins, range=(clip_low,clip_high))