apply numpy.histogram to multidimensional array
Question:
I want to apply numpy.histogram()
to a multi-dimensional array along an axis.
Say, for example I have a 2D array and I want to apply histogram()
along axis=1
.
Code:
import numpy
array = numpy.array([[0.6, 0.7, -0.3, 1.0, -0.8], [0.2, -1.0, -0.5, 0.5, 0.8],
[0.25, 0.3, -0.1, -0.8, 1.0]])
bins = [-1.0, -0.5, 0, 0.5, 1.0, 1.0]
hist, bin_edges = numpy.histogram(array, bins)
print(hist)
Output:
[3 3 3 4 2]
Expected Output:
[[1 1 0 2 1],
[1 1 1 2 0],
[1 1 2 0 1]]
How can I get my expected output?
I tried to use the solution suggested in this post, but it doesn’t get me to the expected output.
Answers:
For n-d cases, you can do this with np.histogram2d
just by making a dummy x-axis (i
):
def vec_hist(a, bins):
i = np.repeat(np.arange(np.product(a.shape[:-1]), a.shape[-1]))
return np.histogram2d(i, a.flatten(), (a.shape[0], bins)).reshape(a.shape[:-1], -1)
Output
vec_hist(array, bins)
Out[453]:
(array([[ 1., 1., 0., 2., 1.],
[ 1., 1., 1., 2., 0.],
[ 1., 1., 2., 0., 1.]]),
array([ 0. , 0.66666667, 1.33333333, 2. ]),
array([-1. , -0.5 , 0. , 0.5 , 0.9999999, 1. ]))
For histograms over arbitrary axis, you’ll probably need to create i
using np.meshgrid
and np.ravel_multi_axis
and then use that to reshape the resulting histogram.
Numpy code changed and the answer proposed by Daniel F is not working anymore.
Here is a simplified version for 2D arrays:
def vec_hist(a, bins):
i = np.tile(np.arange(a.shape[0]), (a.shape[1], 1)).T.flatten()
H, _, _ = np.histogram2d(i, a.flatten(), (a.shape[0], bins))
return H
So basically first we build a list of coordinates "i" that is simply the index of the data along the last axis.
Then we flatten it, along with the data array "a" before sending it to np.histogram2d. The bin argument of histogram2d are the bin the dummy and the actually desired dimension.
Code:
array = np.array([[0.6, 0.7, -0.3, 1.0, -0.8],
[0.2, -1.0, -0.5, 0.5, 0.8],
[0.25, 0.3, -0.1, -0.8, 1.0]])
bins = [-1.0, -0.5, 0, 0.5, 1.0, 1.0]
vec_hist(array, bins)
Output
array([[1., 1., 0., 2., 1.],
[1., 1., 1., 2., 0.],
[1., 1., 2., 0., 1.]])
I want to apply numpy.histogram()
to a multi-dimensional array along an axis.
Say, for example I have a 2D array and I want to apply histogram()
along axis=1
.
Code:
import numpy
array = numpy.array([[0.6, 0.7, -0.3, 1.0, -0.8], [0.2, -1.0, -0.5, 0.5, 0.8],
[0.25, 0.3, -0.1, -0.8, 1.0]])
bins = [-1.0, -0.5, 0, 0.5, 1.0, 1.0]
hist, bin_edges = numpy.histogram(array, bins)
print(hist)
Output:
[3 3 3 4 2]
Expected Output:
[[1 1 0 2 1],
[1 1 1 2 0],
[1 1 2 0 1]]
How can I get my expected output?
I tried to use the solution suggested in this post, but it doesn’t get me to the expected output.
For n-d cases, you can do this with np.histogram2d
just by making a dummy x-axis (i
):
def vec_hist(a, bins):
i = np.repeat(np.arange(np.product(a.shape[:-1]), a.shape[-1]))
return np.histogram2d(i, a.flatten(), (a.shape[0], bins)).reshape(a.shape[:-1], -1)
Output
vec_hist(array, bins)
Out[453]:
(array([[ 1., 1., 0., 2., 1.],
[ 1., 1., 1., 2., 0.],
[ 1., 1., 2., 0., 1.]]),
array([ 0. , 0.66666667, 1.33333333, 2. ]),
array([-1. , -0.5 , 0. , 0.5 , 0.9999999, 1. ]))
For histograms over arbitrary axis, you’ll probably need to create i
using np.meshgrid
and np.ravel_multi_axis
and then use that to reshape the resulting histogram.
Numpy code changed and the answer proposed by Daniel F is not working anymore.
Here is a simplified version for 2D arrays:
def vec_hist(a, bins):
i = np.tile(np.arange(a.shape[0]), (a.shape[1], 1)).T.flatten()
H, _, _ = np.histogram2d(i, a.flatten(), (a.shape[0], bins))
return H
So basically first we build a list of coordinates "i" that is simply the index of the data along the last axis.
Then we flatten it, along with the data array "a" before sending it to np.histogram2d. The bin argument of histogram2d are the bin the dummy and the actually desired dimension.
Code:
array = np.array([[0.6, 0.7, -0.3, 1.0, -0.8],
[0.2, -1.0, -0.5, 0.5, 0.8],
[0.25, 0.3, -0.1, -0.8, 1.0]])
bins = [-1.0, -0.5, 0, 0.5, 1.0, 1.0]
vec_hist(array, bins)
Output
array([[1., 1., 0., 2., 1.],
[1., 1., 1., 2., 0.],
[1., 1., 2., 0., 1.]])