Is there a vectorized way to find maxes within labeled areas in NumPy?
Question:
I have a 2D array representing tree heights, where 0
is the ground. I have another array that’s always the same size showing segmented and labeled trees, where a 0
label means ground, and a positive integer value represents a unique tree. Here are some slices of the data:
heights = array([[37.5 , 41.82, 42.18, 42.18, 42.18, 39.23, 40.68, 40.71, 40.71,
40.19, 35.03, 41.41, 41.41, 41.41, 40.77, 32.23, 32.23, 32.23,
31.45, 25.6 , 25.63, 30.12, 30.78, 30.78, 30.92],
[37.5 , 37.5 , 41.82, 42.18, 41.78, 41.78, 40.68, 40.68, 40.68,
40.19, 41.04, 41.41, 41.41, 41.41, 41.03, 32.23, 32.23, 32.23,
31.25, 25.6 , 25.6 , 30.12, 30.12, 21.08, 30.88],
[37.5 , 37.5 , 34.61, 41.78, 41.78, 25.6 , 39.14, 40.68, 38.79,
38.79, 41.04, 41.04, 41.8 , 41.8 , 41.8 , 24.66, 24.66, 31.25,
25.63, 26.24, 26.2 , 25.2 , 24.93, 21.03, 21.03],
[34.53, 34.61, 34.61, 35.23, 35.23, 25.32, 25.32, 33.17, 33.17,
38.86, 39.4 , 40.31, 41.8 , 41.8 , 41.8 , 41.17, 25.37, 26.77,
27.32, 27.39, 27.39, 26.96, 25.2 , 28.68, 28.68],
[34.53, 34.52, 36.5 , 36.58, 36.67, 36.67, 25.15, 33.17, 38.65,
38.86, 39.4 , 39.53, 40.78, 41.17, 41.17, 0. , 26.77, 27.09,
27.39, 27.6 , 27.6 , 28. , 28.16, 28.68, 28.68],
[32.22, 36.45, 37.1 , 37.28, 37.28, 38.07, 30.98, 31.12, 38.65,
38.65, 39.12, 39.4 , 40.78, 40.78, 0. , 0. , 27.41, 27.72,
27.72, 28.49, 28.49, 28.16, 28.34, 28.87, 28.68],
[36.45, 37.1 , 37.1 , 37.28, 38.23, 38.23, 38.23, 33.61, 32.31,
38.65, 38.65, 38.62, 39.01, 33.75, 34.65, 34.65, 27.41, 27.72,
27.72, 28.49, 28.49, 28.49, 28.87, 30.31, 30.31],
[35.71, 36.45, 37.1 , 30.96, 38.23, 38.23, 38.23, 33.61, 33.28,
33.42, 33.5 , 33.5 , 33.51, 34.07, 34.65, 34.65, 27.36, 27.83,
27.83, 28.49, 28.49, 28.43, 28.87, 31.82, 31.68],
[14.44, 0. , 0. , 0. , 21.41, 32.98, 33.61, 33.61, 34.27,
34.8 , 34.8 , 33.5 , 33.4 , 34.07, 34.65, 34.65, 0. , 27.83,
27.83, 28.7 , 29.18, 29.18, 31.82, 31.82, 31.98],
[13.46, 0. , 0. , 21.41, 21.73, 31.36, 33.33, 33.33, 34.89,
34.99, 34.99, 32.72, 33.4 , 33.8 , 33.8 , 0. , 0. , 0. ,
28.7 , 28.7 , 29.64, 29.64, 31.82, 31.82, 35.82],
[13.46, 0. , 0. , 0. , 21.73, 31.36, 31.46, 35.81, 36.33,
36.33, 36.33, 32.72, 33.37, 33.71, 33.71, 0. , 0. , 0. ,
28.7 , 29.64, 29.64, 29.77, 29.77, 29.77, 35.95],
[ 0. , 0. , 0. , 0. , 0. , 24.07, 31.57, 35.9 , 36.33,
36.33, 36.33, 21.97, 32.72, 33.37, 33.37, 0. , 0. , 0. ,
28.36, 29.04, 29.64, 29.77, 29.77, 29.77, 35.95],
[ 0. , 0. , 0. , 0. , 22.09, 24.07, 23.92, 31.57, 35.9 ,
36.33, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
28.38, 29.53, 28.96, 28.96, 28.69, 29.19, 35.49],
[ 0. , 0. , 0. , 0. , 22.09, 22.09, 22.09, 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
29.53, 29.53, 29.82, 28.96, 28.73, 29.19, 29.19],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
29.53, 30.12, 30.12, 29.82, 28.73, 0. , 28.89],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 30.12, 30.12, 30.12, 28.94, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 30.12, 30.12, 29.82, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 28.65, 28.65, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ]], dtype=float32)
labeled_trees = array([[33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 39, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 37, 0,
39, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 0, 0,
39, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 37,
37, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37,
37, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 0, 0, 0, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37,
0, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 0, 0, 33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[33, 0, 0, 0, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[ 0, 0, 0, 0, 0, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[ 0, 0, 0, 0, 33, 33, 33, 33, 33, 33, 0, 0, 0, 0, 0, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[ 0, 0, 0, 0, 33, 33, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 39, 39, 39, 39, 39, 0, 39],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 39, 39, 39, 39, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 39, 39, 39, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 39, 39, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)
I’d like to find the max height within each labeled region. I have done this successfully with a for loop, but it’s slow.
max_heights = {}
for label in list(np.unique(labeled_trees))[1:]:
tree_height = np.amax(heights[labeled_trees == label])
max_heights[str(label)] = tree_height
# max_heights = {'33': 42.18, '37': 41.8, '39': 35.95}
Is there a faster/vectorized/more efficient way of finding the max values within labeled regions of a numpy array? The ideal output would be a boolean array where the location of each max is True
.
[EDIT]
The maximum_position
function from scipy.ndimage
is promising, but it looks like the it only returns the first location where the pixel equals the local max. I need every location within a labeled region that equals its max.
Answers:
This returns the ideal output you need, but it is not fast enough. On my machine, it needs about 60 µs:
def max_mask(labeled_trees, heights):
cmp = labeled_trees.reshape(1, -1) != np.unique(labeled_trees)[1:, None]
indices = np.ma.masked_array(np.broadcast_to(heights.ravel(), cmp.shape), cmp).argmax(-1)
ret = np.zeros(heights.size, bool)
ret[indices] = True
return ret.reshape(heights.shape)
Some explanations:
- The first step is to use broadcast to return the comparison result of each value of
np.unique(labeled_trees)[1:]
with labeled_trees.ravel()
, which will be a 2d array with the shape of (np.unique(labeled_trees)[1:].size, labeled_trees.size)
. The equivalent code is given below:
cmp = np.array([labeled_tree.ravel() != elem for elem in np.unique(labeled_tree)[1:]])
- The second step is to flatten the
heights
and broadcast it as the shape of cmp
as the value of np.ma.masked_array
, cmp
as the mask, and then find argmax
for the mask array, which will find out the position of the maximum value of the valid part for each sub array. The equivalent code is given below:
indices = np.array([np.ma.masked_array(heights, mask).argmax() for mask in cmp])
- The remaining steps are very simple. We have already got the position of the maximum value of each unique value range of
heights
. Just create a bool array of the same size and set the corresponding position to True, finally reshape and return.
Test:
>>> print(max_mask(labeled_trees, heights).astype(int))
[[0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
>>> heights[max_mask(labeled_trees, heights)]
array([42.18, 41.8 , 35.95])
Better performance version: Here I refer to the impletementation of masked_array.argmax
to get a faster method. On my machine, it only needs about 22 μs:
def max_mask(labeled_trees, heights):
cmp = labeled_trees.reshape(1, -1) != np.unique(labeled_trees)[1:, None]
indices = np.where(cmp, -np.inf, heights.ravel()).argmax(-1)
ret = np.zeros(heights.size, bool)
ret[indices] = True
return ret.reshape(heights.shape)
In order to avoid the possible copy caused by labeled_trees.reshape
, it can be changed to the following form:
def max_mask(labeled_trees, heights):
cmp = labeled_trees[None] != np.unique(labeled_trees)[1:, None, None]
indices = np.where(cmp, -np.inf, heights).reshape(-1, heights.size).argmax(-1)
ret = np.zeros(heights.size, bool)
ret[indices] = True
return ret.reshape(heights.shape)
More consistent with ideal output version: I noticed that you asked "every location within a labeled region that equals its max", and I updated the answer again, it needs about 34 μs to run on my machine:
def max_mask(labeled_trees, heights):
cmp = labeled_trees[None] != np.unique(labeled_trees)[1:, None, None]
masked = np.where(cmp, -np.inf, heights).reshape(-1, heights.size)
return (masked.max(-1, keepdims=True) == masked).any(0).reshape(heights.shape)
Test:
>>> print(max_mask(labeled_trees, heights).astype(int))
[[0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
>>> heights[max_mask(labeled_trees, heights)]
array([42.18, 42.18, 42.18, 42.18, 41.8 , 41.8 , 41.8 , 41.8 , 41.8 ,
41.8 , 35.95, 35.95])
Supplement: the above version has been vectorized as much as possible, but it requires a large amount of memory (especially when np.unique(labeled_trees)
is very large). I tested it with random numbers and found that its speed will be seriously slowed down due to memory problems. Therefore, a solution using loops is provided here. It requires very little memory:
def max_mask_loop(labeled_trees, heights):
ret = np.zeros(heights.shape, bool)
for val in np.unique(labeled_trees)[1:]:
masked = np.where(labeled_trees != val, -np.inf, heights)
ret |= masked.max() == masked
return ret
Comparison:
>>> heights = np.random.rand(2500, 2500)
>>> labeled_trees = np.random.randint(0, 300, heights.shape)
>>> timeit(lambda: max_mask(labeled_trees, heights), number=1)
37.30420290003531
>>> timeit(lambda: max_mask_loop(labeled_trees, heights), number=1)
9.9376986999996
Check Below pure numpy implementation using reduceat
## Step 1 Flatten Array
height_1d = heights[labeled_trees>0].reshape(1,-1)[0]
labeled_trees_1d = labeled_trees[labeled_trees>0].reshape(1,-1)[0]
## Step 2 : Sort arrays while maintaining there relationship
srt_indicies = labeled_trees_1d.argsort()
sorted_heights = height_1d[srt_indicies]
sorted_labeled_trees = labeled_trees_1d[srt_indicies]
## Extract indices where maximum need to be found
_, idx = np.unique(sorted_labeled_trees, return_index=True)
## Use maximum.reduce to find array and dict comprehension for final output
{str(key):value for key, value in zip(list(sorted_labeled_trees[idx]) ,list(np.maximum.reduceat(sorted_heights, idx))) if key > 0}
Output:
Here is a much simpler use of np.maximum.reduceat
:
idx = labeled_trees.argsort(None)
sorted_labeled_trees = labeled_trees.ravel()[idx]
sorted_heights = heights.ravel()[idx]
bins = np.flatnonzero(np.diff(sorted_labeled_trees) != 0) + 1
max_heights = np.maximum.reduceat(sorted_heights, bins)
max_trees = sorted_labeled_trees[bins]
If you insist on a dictionary, you can make one with zip
:
result = dict(zip(max_trees, max_heights))
If you want a mask of the positions where the maxima occur and the number of trees is relatively small, you can compute the mask more-or-less directly using broadcasting:
peak_mask = ((max_trees == labeled_trees[..., None]) & (max_height == heights[..., None])).any(-1)
If the number of trees is not small, you will be better off using a loop over the labels:
peak_mask = np.zeros(labeled_trees.shape, bool)
for t, h in zip(max_trees, max_height):
peak_mask |= (labeled_trees == t) & (heights == h)
I have a 2D array representing tree heights, where 0
is the ground. I have another array that’s always the same size showing segmented and labeled trees, where a 0
label means ground, and a positive integer value represents a unique tree. Here are some slices of the data:
heights = array([[37.5 , 41.82, 42.18, 42.18, 42.18, 39.23, 40.68, 40.71, 40.71,
40.19, 35.03, 41.41, 41.41, 41.41, 40.77, 32.23, 32.23, 32.23,
31.45, 25.6 , 25.63, 30.12, 30.78, 30.78, 30.92],
[37.5 , 37.5 , 41.82, 42.18, 41.78, 41.78, 40.68, 40.68, 40.68,
40.19, 41.04, 41.41, 41.41, 41.41, 41.03, 32.23, 32.23, 32.23,
31.25, 25.6 , 25.6 , 30.12, 30.12, 21.08, 30.88],
[37.5 , 37.5 , 34.61, 41.78, 41.78, 25.6 , 39.14, 40.68, 38.79,
38.79, 41.04, 41.04, 41.8 , 41.8 , 41.8 , 24.66, 24.66, 31.25,
25.63, 26.24, 26.2 , 25.2 , 24.93, 21.03, 21.03],
[34.53, 34.61, 34.61, 35.23, 35.23, 25.32, 25.32, 33.17, 33.17,
38.86, 39.4 , 40.31, 41.8 , 41.8 , 41.8 , 41.17, 25.37, 26.77,
27.32, 27.39, 27.39, 26.96, 25.2 , 28.68, 28.68],
[34.53, 34.52, 36.5 , 36.58, 36.67, 36.67, 25.15, 33.17, 38.65,
38.86, 39.4 , 39.53, 40.78, 41.17, 41.17, 0. , 26.77, 27.09,
27.39, 27.6 , 27.6 , 28. , 28.16, 28.68, 28.68],
[32.22, 36.45, 37.1 , 37.28, 37.28, 38.07, 30.98, 31.12, 38.65,
38.65, 39.12, 39.4 , 40.78, 40.78, 0. , 0. , 27.41, 27.72,
27.72, 28.49, 28.49, 28.16, 28.34, 28.87, 28.68],
[36.45, 37.1 , 37.1 , 37.28, 38.23, 38.23, 38.23, 33.61, 32.31,
38.65, 38.65, 38.62, 39.01, 33.75, 34.65, 34.65, 27.41, 27.72,
27.72, 28.49, 28.49, 28.49, 28.87, 30.31, 30.31],
[35.71, 36.45, 37.1 , 30.96, 38.23, 38.23, 38.23, 33.61, 33.28,
33.42, 33.5 , 33.5 , 33.51, 34.07, 34.65, 34.65, 27.36, 27.83,
27.83, 28.49, 28.49, 28.43, 28.87, 31.82, 31.68],
[14.44, 0. , 0. , 0. , 21.41, 32.98, 33.61, 33.61, 34.27,
34.8 , 34.8 , 33.5 , 33.4 , 34.07, 34.65, 34.65, 0. , 27.83,
27.83, 28.7 , 29.18, 29.18, 31.82, 31.82, 31.98],
[13.46, 0. , 0. , 21.41, 21.73, 31.36, 33.33, 33.33, 34.89,
34.99, 34.99, 32.72, 33.4 , 33.8 , 33.8 , 0. , 0. , 0. ,
28.7 , 28.7 , 29.64, 29.64, 31.82, 31.82, 35.82],
[13.46, 0. , 0. , 0. , 21.73, 31.36, 31.46, 35.81, 36.33,
36.33, 36.33, 32.72, 33.37, 33.71, 33.71, 0. , 0. , 0. ,
28.7 , 29.64, 29.64, 29.77, 29.77, 29.77, 35.95],
[ 0. , 0. , 0. , 0. , 0. , 24.07, 31.57, 35.9 , 36.33,
36.33, 36.33, 21.97, 32.72, 33.37, 33.37, 0. , 0. , 0. ,
28.36, 29.04, 29.64, 29.77, 29.77, 29.77, 35.95],
[ 0. , 0. , 0. , 0. , 22.09, 24.07, 23.92, 31.57, 35.9 ,
36.33, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
28.38, 29.53, 28.96, 28.96, 28.69, 29.19, 35.49],
[ 0. , 0. , 0. , 0. , 22.09, 22.09, 22.09, 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
29.53, 29.53, 29.82, 28.96, 28.73, 29.19, 29.19],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
29.53, 30.12, 30.12, 29.82, 28.73, 0. , 28.89],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 30.12, 30.12, 30.12, 28.94, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 30.12, 30.12, 29.82, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 28.65, 28.65, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. ]], dtype=float32)
labeled_trees = array([[33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 39, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 37, 0,
39, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 0, 0,
39, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 37,
37, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37,
37, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 0, 0, 0, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 37,
0, 39, 39, 39, 39, 39, 39, 39, 39],
[33, 0, 0, 33, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[33, 0, 0, 0, 33, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[ 0, 0, 0, 0, 0, 33, 33, 33, 33, 33, 33, 33, 37, 37, 37, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[ 0, 0, 0, 0, 33, 33, 33, 33, 33, 33, 0, 0, 0, 0, 0, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[ 0, 0, 0, 0, 33, 33, 33, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 39, 39, 39, 39, 39, 39, 39],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 39, 39, 39, 39, 39, 0, 39],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 39, 39, 39, 39, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 39, 39, 39, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 39, 39, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)
I’d like to find the max height within each labeled region. I have done this successfully with a for loop, but it’s slow.
max_heights = {}
for label in list(np.unique(labeled_trees))[1:]:
tree_height = np.amax(heights[labeled_trees == label])
max_heights[str(label)] = tree_height
# max_heights = {'33': 42.18, '37': 41.8, '39': 35.95}
Is there a faster/vectorized/more efficient way of finding the max values within labeled regions of a numpy array? The ideal output would be a boolean array where the location of each max is True
.
[EDIT]
The maximum_position
function from scipy.ndimage
is promising, but it looks like the it only returns the first location where the pixel equals the local max. I need every location within a labeled region that equals its max.
This returns the ideal output you need, but it is not fast enough. On my machine, it needs about 60 µs:
def max_mask(labeled_trees, heights):
cmp = labeled_trees.reshape(1, -1) != np.unique(labeled_trees)[1:, None]
indices = np.ma.masked_array(np.broadcast_to(heights.ravel(), cmp.shape), cmp).argmax(-1)
ret = np.zeros(heights.size, bool)
ret[indices] = True
return ret.reshape(heights.shape)
Some explanations:
- The first step is to use broadcast to return the comparison result of each value of
np.unique(labeled_trees)[1:]
withlabeled_trees.ravel()
, which will be a 2d array with the shape of(np.unique(labeled_trees)[1:].size, labeled_trees.size)
. The equivalent code is given below:
cmp = np.array([labeled_tree.ravel() != elem for elem in np.unique(labeled_tree)[1:]])
- The second step is to flatten the
heights
and broadcast it as the shape ofcmp
as the value ofnp.ma.masked_array
,cmp
as the mask, and then findargmax
for the mask array, which will find out the position of the maximum value of the valid part for each sub array. The equivalent code is given below:
indices = np.array([np.ma.masked_array(heights, mask).argmax() for mask in cmp])
- The remaining steps are very simple. We have already got the position of the maximum value of each unique value range of
heights
. Just create a bool array of the same size and set the corresponding position to True, finally reshape and return.
Test:
>>> print(max_mask(labeled_trees, heights).astype(int))
[[0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
>>> heights[max_mask(labeled_trees, heights)]
array([42.18, 41.8 , 35.95])
Better performance version: Here I refer to the impletementation of masked_array.argmax
to get a faster method. On my machine, it only needs about 22 μs:
def max_mask(labeled_trees, heights):
cmp = labeled_trees.reshape(1, -1) != np.unique(labeled_trees)[1:, None]
indices = np.where(cmp, -np.inf, heights.ravel()).argmax(-1)
ret = np.zeros(heights.size, bool)
ret[indices] = True
return ret.reshape(heights.shape)
In order to avoid the possible copy caused by labeled_trees.reshape
, it can be changed to the following form:
def max_mask(labeled_trees, heights):
cmp = labeled_trees[None] != np.unique(labeled_trees)[1:, None, None]
indices = np.where(cmp, -np.inf, heights).reshape(-1, heights.size).argmax(-1)
ret = np.zeros(heights.size, bool)
ret[indices] = True
return ret.reshape(heights.shape)
More consistent with ideal output version: I noticed that you asked "every location within a labeled region that equals its max", and I updated the answer again, it needs about 34 μs to run on my machine:
def max_mask(labeled_trees, heights):
cmp = labeled_trees[None] != np.unique(labeled_trees)[1:, None, None]
masked = np.where(cmp, -np.inf, heights).reshape(-1, heights.size)
return (masked.max(-1, keepdims=True) == masked).any(0).reshape(heights.shape)
Test:
>>> print(max_mask(labeled_trees, heights).astype(int))
[[0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
>>> heights[max_mask(labeled_trees, heights)]
array([42.18, 42.18, 42.18, 42.18, 41.8 , 41.8 , 41.8 , 41.8 , 41.8 ,
41.8 , 35.95, 35.95])
Supplement: the above version has been vectorized as much as possible, but it requires a large amount of memory (especially when np.unique(labeled_trees)
is very large). I tested it with random numbers and found that its speed will be seriously slowed down due to memory problems. Therefore, a solution using loops is provided here. It requires very little memory:
def max_mask_loop(labeled_trees, heights):
ret = np.zeros(heights.shape, bool)
for val in np.unique(labeled_trees)[1:]:
masked = np.where(labeled_trees != val, -np.inf, heights)
ret |= masked.max() == masked
return ret
Comparison:
>>> heights = np.random.rand(2500, 2500)
>>> labeled_trees = np.random.randint(0, 300, heights.shape)
>>> timeit(lambda: max_mask(labeled_trees, heights), number=1)
37.30420290003531
>>> timeit(lambda: max_mask_loop(labeled_trees, heights), number=1)
9.9376986999996
Check Below pure numpy implementation using reduceat
## Step 1 Flatten Array
height_1d = heights[labeled_trees>0].reshape(1,-1)[0]
labeled_trees_1d = labeled_trees[labeled_trees>0].reshape(1,-1)[0]
## Step 2 : Sort arrays while maintaining there relationship
srt_indicies = labeled_trees_1d.argsort()
sorted_heights = height_1d[srt_indicies]
sorted_labeled_trees = labeled_trees_1d[srt_indicies]
## Extract indices where maximum need to be found
_, idx = np.unique(sorted_labeled_trees, return_index=True)
## Use maximum.reduce to find array and dict comprehension for final output
{str(key):value for key, value in zip(list(sorted_labeled_trees[idx]) ,list(np.maximum.reduceat(sorted_heights, idx))) if key > 0}
Output:
Here is a much simpler use of np.maximum.reduceat
:
idx = labeled_trees.argsort(None)
sorted_labeled_trees = labeled_trees.ravel()[idx]
sorted_heights = heights.ravel()[idx]
bins = np.flatnonzero(np.diff(sorted_labeled_trees) != 0) + 1
max_heights = np.maximum.reduceat(sorted_heights, bins)
max_trees = sorted_labeled_trees[bins]
If you insist on a dictionary, you can make one with zip
:
result = dict(zip(max_trees, max_heights))
If you want a mask of the positions where the maxima occur and the number of trees is relatively small, you can compute the mask more-or-less directly using broadcasting:
peak_mask = ((max_trees == labeled_trees[..., None]) & (max_height == heights[..., None])).any(-1)
If the number of trees is not small, you will be better off using a loop over the labels:
peak_mask = np.zeros(labeled_trees.shape, bool)
for t, h in zip(max_trees, max_height):
peak_mask |= (labeled_trees == t) & (heights == h)