Extracting the dimensions of a rectangle

Question:

So I’m working on an image processing project with sonar images. To be more specific, I am trying to extract the dimensions of an image of a pool taken by a sonar scanner. I was able to extract the rectangular region of the pool but I can not figure out how to get the dimensions of each edge in terms of pixels. I am working with OpenCV in Python.

I would appreciate if anyone would give me any suggestion on how this could be done.

I already tried finding the line intersections of hough lines but that did not have promising results.

Code so far.

import cv2 
import numpy as np 
from scipy import ndimage as ndi
from scipy.ndimage.measurements import label

def largest_component(indices):
    #this function takes a list of indices denoting
    #the white regions of the image and returns the largest 
    #white object of connected indices 

    return_arr = np.zeros((512,512), dtype=np.uint8)
    for index in indeces:
        return_arr[index[0]][index[1]] = 255

    return return_arr

image = cv2.imread('sonar_dataset/useful/sonarXY_5.bmp', 0)
image_gaussian = ndi.gaussian_filter(image, 4)
image_gaussian_inv = cv2.bitwise_not(image_gaussian)
kernel = np.ones((3,3),np.uint8)

# double thresholding extracting the sides of the rectangle
ret1, img1 = cv2.threshold(image_gaussian_inv, 120, 255, cv2.THRESH_BINARY)
ret2, img2 = cv2.threshold(image_gaussian_inv, 150, 255, cv2.THRESH_BINARY)

double_threshold = img1 - img2
closing = cv2.morphologyEx(double_threshold, cv2.MORPH_CLOSE, kernel1)

labeled, ncomponents = label(closing, kernel)
indices = np.indices(closing.shape).T[:,:,[1, 0]]
twos = indices[labeled == 2]
area =[np.sum(labeled==val) for val in range(ncomponents+1)]

rectangle = largest_component(twos)

cv2.imshow('rectangle', rectangle)
cv2.waitKey(0)

The original image and extracted object are below.

Original Image

Extracted Object

Asked By: rockmyboat

||

Answers:

It’s not perfect, but this simple approach should be a good starting point for you:

import cv2, math
import numpy as np

img = cv2.imread(R'D:devprojectsstackoverflowdimensions_of_rectangleimg1.png')
print(img.shape)
img_moments=cv2.moments(img[:,:,0]) #use only one channel here (cv2.moments operates only on single channels images)
print(img_moments)
# print(dir(img_moments))

# calculate centroid (center of mass of image)
x = img_moments['m10'] / img_moments['m00']
y = img_moments['m01'] / img_moments['m00']

# calculate orientation of image intensity (it corresponds to the image intensity axis)
u00 = img_moments['m00']
u20 = img_moments['m20'] - x*img_moments['m10']
u02 = img_moments['m02'] - y*img_moments['m01']
u11 = img_moments['m11'] - x*img_moments['m01']

u20_prim = u20/u00
u02_prim = u02/u00
u11_prim = u11/u00

angle = 0.5 * math.atan(2*u11_prim / (u20_prim - u02_prim))
print('The image should be rotated by: ', math.degrees(angle) / 2.0, ' degrees')

cols,rows = img.shape[:2]
# rotate the image by half of this angle
rotation_matrix = cv2.getRotationMatrix2D((cols/2,rows/2), math.degrees(angle / 2.0), 1)
img_rotated = cv2.warpAffine(img, rotation_matrix ,(cols,rows))
# print(img_rotated.shape, img_rotated.dtype)

cv2.imwrite(R'D:devprojectsstackoverflowdimensions_of_rectangleimg1_rotated.png', img_rotated)

img_rotated_clone = np.copy(img_rotated)
img_rotated_clone2 = np.copy(img_rotated)

# first method - just calculate bounding rect
bounding_rect = cv2.boundingRect(img_rotated[:, :, 0])
cv2.rectangle(img_rotated_clone, (bounding_rect[0], bounding_rect[1]), 
    (bounding_rect[0] + bounding_rect[2], bounding_rect[1] + bounding_rect[3]), (255,0,0), 2)

# second method - find columns and rows with biggest sums
def nlargest_cols(a, n):
    col_sums = [(np.sum(col), idx) for idx, col in enumerate(a.T)]
    return sorted(col_sums, key=lambda a: a[0])[-n:]

def nlargest_rows(a, n):
    col_sums = [(np.sum(col), idx) for idx, col in enumerate(a[:,])]
    return sorted(col_sums, key=lambda a: a[0])[-n:]

top15_cols_indices = nlargest_cols(img_rotated[:,:,0], 15)
top15_rows_indices = nlargest_rows(img_rotated[:,:,0], 15)

for a in top15_cols_indices:
    cv2.line(img_rotated_clone, (a[1], 0), (a[1], rows), (0, 255, 0), 1)

for a in top15_rows_indices:
    cv2.line(img_rotated_clone, (0, a[1]), (cols, a[1]), (0, 0, 255), 1)

cv2.imwrite(R'D:devprojectsstackoverflowdimensions_of_rectangleimg2.png', img_rotated_clone)

Of course you need to adjust paths. img1.png is the second image from your question, img1_rotated is the result of rotating the image:

enter image description here

and img2 is the final output:

enter image description here
The blue rectangle is method1 (just a bounding rect) and green and red lines (15 red and 15 green – all 1 pixel wide) is the second method.

The algorithm is quite simple:

  1. Calculate image moments to determine the main axis of image intensity (I don’t know how to describe it well – check the wiki page https://en.wikipedia.org/wiki/Image_moment#Examples_2 ). Basically this is the angle by which you have to rotate the image to make white pixels distributed horizontally or vertically.
  2. Once you know the angle – rotate the image (and save the result).
  3. Method 1 – calculate and draw rotated rect of all pixels.
  4. Method 2 – find 15 rows and 15 cols with biggest sums (== biggest count of white pixels) and draw horizontal/vertical lines in those rows/columns. Note that the number 15 was selected by trial and error, but it should be easy to select 2 columns (and rows) with big sum which are not close to each other. Those columns/rows are good candidates to be rectangle boundaries.

Hope it is what you were looking for, let me know of you will have any questions.

Answered By: cyriel

So here’s what I came up with – it’s a bit labour intensive but it does get us to the right answer eventually. I will be directly using the connected components output that you have shown with the last image.

  1. Use morphological image skeletonization so that we get the skeleton of the blob. This way, it’ll give us the most minimal contour representation such that we get a one pixel wide boundary that goes through the middle of each thick edge. You can achieve this through Scikit-image’s skeletonize method.

  2. Use the Hough Transform which is a line detection method on the skeletonized image. In summary, it parameterizes lines in the polar domain and the output would be a set of rho and theta that tell us which lines are detected in the skeletonized image. We can use OpenCV’s cv2.HoughLines for that. It is very important that you do this on the skeletonized image or we will have a lot of candidate lines parallel to where the true delineation of the bounding box is and you wouldn’t be able to distinguish between them.

  3. Take each pair of lines and find their point of intersection. We’d expect that with all pairs of lines, there will be 4 predominant clusters of intersections that give us the corner of each rectangle.

  4. Because of the noise in the contours, we may get more than four points of intersection. We can use the convex hull to finally get 4 points of intersection for the rectangle. In summary, the convex hull algorithm operates on a list of points where it defines a subset of points that can minimally encompass the list of points. We can use cv2.convexHull.

  5. Finally, due to the quantization of the Hough Transform, there may be multiple points that are within the vicinity of each corner. Therefore, apply K-Means clustering to find 4 clusters of points and thus find their centroids. We can use cv2.kmeans for that.

  6. Once we find the centroids, we can simply iterate through each pair of points in a cyclical fashion to finally find the distances to each corner and thus find the distances you care about.

Let’s step through each point one by one:

Step #1 – Morphological Image Skeletonization

Using Scikit-image’s skeletonize, we can skeletonize the connected components image that you’ve shown above. Take note that you need to convert the image to binary before you proceed. Once you call the method, we’ll need to convert back to unsigned 8-bit integer after for the rest of the process. I’ve downloaded the image above and saved it locally. We can run the skeletonize method after:

from skimage.morphology import skeletonize

im = cv2.imread('K7ELI.png', 0)
out = skeletonize(im > 0)

# Convert to uint8
out = 255*(out.astype(np.uint8))

We get this image:

fig1

Step #2 – Use the Hough Transform

Using the Hough Transform, we can detect the most prominent lines in this image:

lines = cv2.HoughLines(out,1,np.pi/180,60)

Here we specify the search space so that we look for lines where the bin size has a length of 1 and the angles have a bin of 1 degree, or pi / 180 radians. In summary, the Hough Transform looks at each edge point and iterates through a range of angles theta that are subtended from the origin to each edge point and calculate the corresponding value of rho respecting the bin size. This pair gets logged into a 2D histogram and we register a vote. We threshold this 2D histogram so that any bins beyond a certain value are line candidates. In the above line of code, set the threshold for the bin counts to be 60.

This code is optional, but I wanted to show you what the visualized lines look like:

img_colour = np.dstack([im, im, im])
lines = cv2.HoughLines(edges,1,np.pi/180,60)
for rho,theta in lines[:,0]:
    a = np.cos(theta)
    b = np.sin(theta)
    x0 = a*rho
    y0 = b*rho
    x1 = int(x0 + 1000*(-b))
    y1 = int(y0 + 1000*(a))
    x2 = int(x0 - 1000*(-b))
    y2 = int(y0 - 1000*(a))
    cv2.line(img_colour,(x1,y1),(x2,y2),(0,0,255),2)

This code I pulled from the following tutorial. It draws the Hough Transform detected lines in the image as red. I get the following image:

fig2

As we can see, there are four points of intersection in the image. It’s our job next to find these points of intersection.

Step #3 – Find points of intersection

In the Hough Transform, we can relate the length of the line from the origin to a point (x, y) in the image subtended at the angle theta by:

rho = x*cos(theta) + y*sin(theta)

We can also form the equation of the line y = m*x + c in Cartesian form. We can transform between the two by dividing both sides of the rho equation by sin(theta) then moving the relevant terms to each side:

Therefore, we should cycle through all unique pairs of lines and using the above equation, we can find their point of intersection by setting their Cartesian forms to be equal to each other. This I won’t derive for you for the interest of saving space, but simply set two lines in Cartesian form equal to each other and solve for the x coordinate of intersection. Once that’s done, substitute this point into any of the two lines to find the y coordinate. We should obviously skip intersection points that go outside of the image in the case of two nearly parallel lines or if we choose two pairs of lines that go in the same direction and don’t intersect.

pts = []
for i in range(lines.shape[0]):
    (rho1, theta1) = lines[i,0]
    m1 = -1/np.tan(theta1)
    c1 = rho1 / np.sin(theta1)
    for j in range(i+1,lines.shape[0]):
        (rho2, theta2) = lines[j,0]
        m2 = -1 / np.tan(theta2)
        c2 = rho2 / np.sin(theta2)
        if np.abs(m1 - m2) <= 1e-8:
            continue
        x = (c2 - c1) / (m1 - m2)
        y = m1*x + c1
        if 0 <= x < img.shape[1] and 0 <= y < img.shape[0]:
            pts.append((int(x), int(y)))

pts is a list of tuples such that we add all points of intersection that are within the image that are not out of bounds.

Step #4 – Use the Convex Hull

We can use this list of tuples and use the convex hull so that we find a list of points that define the outer perimeter of the rectangle. Take note that the order of points defining the rectangle are counter-clockwise. This doesn’t matter for this step but it will matter later:

pts = np.array(pts)
pts = pts[:,None] # We need to convert to a 3D numpy array with a singleton 2nd dimension
hull = cv2.convexHull(pts)

hull contains a 3D NumPy array that is a subset of the original points of intersection that create the outer boundary of the image. We can use these points to draw where these are located in the image for illustration

out2 = np.dstack([im, im, im])
for pt in hull[:,0]:
    cv2.circle(out2, tuple(pt), 2, (0, 255, 0), 2)

I’ve taken the original image and drawn the corner points in green. We get this image:

fig3

Step #5 – Apply K-Means clustering

As you can see in the image above, there are multiple points that map to each corner. It would be good if we can consolidate the multiple points at each corner into a single point. One way is to average all of the points in each corner and the easiest way to do that out of box is to use K-Means clustering. We need the centroids to thus give us the final corner points of the rectangle. We need to make sure we specify 4 clusters to find.

From the K-Means clustering tutorial from the OpenCV docs, we can use this code:

# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)

# Set flags (Just to avoid line break in the code)
flags = cv2.KMEANS_RANDOM_CENTERS

# Apply KMeans
# The convex hull points need to be float32
z = hull.copy().astype(np.float32)
compactness,labels,centers = cv2.kmeans(z,4,None,criteria,10,flags)

The first parameter is the convex hull of points that need to be in float32 as required by the algorithm. The second parameter specifies the number of clusters we want to search for, so 4 in our case. The third parameter you can skip.
It is a placeholder for the best cluster ID each point is assigned to but we don’t need to use it. criteria are the K-Means parameters used for the mechanics of the algorithm, and the fifth parameter tells us how many attempts we should run to find the best clusters. We choose 10, meaning that we run K-Means 10 times and choose the clustering configuration that has the least amount of error. The error is stored in the compactness variable that is output from the algorithm. Finally, the last variable are optional flags and we set this so that the initial centroids of the algorithm are simply selected at random from the points.

labels provides which cluster ID is assigned to each point and centers is the key variable we need which thus returns:

array([[338.5   , 152.5   ],
       [302.6667, 368.6667],
       [139.    , 340.    ],
       [178.5   , 127.    ]], dtype=float32)

These are the four corner points of the rectangle. We can see where these line up by drawing them directly on the original image, and we also get this image:

out3 = np.dstack([im, im, im])
for pt in centers:
    cv2.circle(out3, tuple(pt), 2, (0, 255, 0), 2)

fig5

Step #6 – Measure the lengths now

Finally, we can cycle through each pair of lines and find the corresponding dimensions. Take note that because K-Means has the centroids in random order due to the random nature of the algorithm, we can run the convex hull on these centroids to ensure that the order is circular.

centers = cv2.convexHull(centers)[:,0]
for (i, j) in zip(range(4), [1, 2, 3, 0]):
    length = np.sqrt(np.sum((centers[i] - centers[j])**2.0))
    print('Length of side {}: {}'.format(i+1, length))

We thus get:

Length of side 1: 219.11654663085938
Length of side 2: 166.1582489013672
Length of side 3: 216.63160705566406
Length of side 4: 162.019287109375

If you want perspective to see how the bounding box lines up, let’s actually draw these lines on the image that are defined at these centres:

out4 = np.dstack([im, im, im])
for (i, j) in zip(range(4), [1, 2, 3, 0]):
    cv2.line(out4, tuple(centers[i]), tuple(centers[j]), (0, 0, 255), 2)

We get:

fig5

To see where this lines up with the original image, let’s just repeat the code above but drawing the lines on the original image. I downloaded a copy of the original image to do so:

out5 = cv2.imread('no8BP.png') # Note - grayscale image read in as colour
for (i, j) in zip(range(4), [1, 2, 3, 0]):
    cv2.line(out5, tuple(centers[i]), tuple(centers[j]), (0, 0, 255), 2)

fig6


For the sake of completeness, here’s the entire code from start to finish without all of the debug outputs – we go from reading the image to drawing the lines in the original image with printing the lengths of each side in the detected rectangle.

from skimage.morphology import skeletonize
import cv2
import numpy as np

# Step #1 - Skeletonize
im = cv2.imread('K7ELI.png', 0)
out = skeletonize(im > 0)

# Convert to uint8
out = 255*(out.astype(np.uint8))

# Step #2 - Hough Transform
lines = cv2.HoughLines(out,1,np.pi/180,60)

# Step #3 - Find points of intersection
pts = []
for i in range(lines.shape[0]):
    (rho1, theta1) = lines[i,0]
    m1 = -1/np.tan(theta1)
    c1 = rho1 / np.sin(theta1)
    for j in range(i+1,lines.shape[0]):
        (rho2, theta2) = lines[j,0]
        m2 = -1 / np.tan(theta2)
        c2 = rho2 / np.sin(theta2)
        if np.abs(m1 - m2) <= 1e-8:
            continue
        x = (c2 - c1) / (m1 - m2)
        y = m1*x + c1
        if 0 <= x < img.shape[1] and 0 <= y < img.shape[0]:
            pts.append((int(x), int(y)))

# Step #4 - Find convex hull
pts = np.array(pts)
pts = pts[:,None] # We need to convert to a 3D numpy array with a singleton 2nd dimension
hull = cv2.convexHull(pts)

# Step #5 - K-Means clustering
# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)

# Set flags (Just to avoid line break in the code)
flags = cv2.KMEANS_RANDOM_CENTERS

# Apply KMeans
# The convex hull points need to be float32
z = hull.copy().astype(np.float32)
compactness,labels,centers = cv2.kmeans(z,4,None,criteria,10,flags)

# Step #6 - Find the lengths of each side
centers = cv2.convexHull(centers)[:,0]
for (i, j) in zip(range(4), [1, 2, 3, 0]):
    length = np.sqrt(np.sum((centers[i] - centers[j])**2.0))
    print('Length of side {}: {}'.format(i+1, length))

# Draw the sides of each rectangle in the original image
out5 = cv2.imread('no8BP.png') # Note - grayscale image read in as colour
for (i, j) in zip(range(4), [1, 2, 3, 0]):
    cv2.line(out5, tuple(centers[i]), tuple(centers[j]), (0, 0, 255), 2)

# Show the image
cv2.imshow('Output', out5); cv2.waitKey(0); cv2.destroyAllWindows()
Answered By: rayryeng

There are already two good solutions, I wanted to propose a simpler one based on a different way of detecting the rectangle.

(I’m using MATLAB with DIPimage here, because that’s quicker for me to toss a proof of concept together than Python, but the same exact functionality is available in Python, see at the end of the post. Disclaimer: I’m an author of DIPimage.)

Because the rectangle is a bright shape on a darker background, and is (I presume) guaranteed to surround the center of the image, we can generate a seed in the center of the image and a seed in the periphery, and use the watershed to find the rectangle. The watershed is guaranteed, in this case, to produce a single, 1-pixel thick, closed contour.

img = readim('https://i.stack.imgur.com/no8BP.png');
seeds = clone(img);
seeds(rr(seeds)<50) = 1;
seeds(rr(seeds)>250) = 2;
rect = waterseed(seeds,gaussf(img));
overlay(img,rect) % for display only

OP's image with a red rectangle drawn over the rectangle to be detected

Note that I smoothed the input image a bit. But the rectangle is still quite noisy, which will affect the size measurement we’ll do later. We can smooth it using a morphological opening with a large circular structuring element. This operation will also cut off the corners, but the rounded corners will not influence the measurement result.

rect = opening(fillholes(rect),35);
overlay(img,rect-berosion(rect)) % for display only

Same image, but the rectangle is smoother and with rounded corners

Now we have a nice shape that is suitable for measurement. The Feret diameters are the lengths of the projections of a shape. We measure the length of the shortest projection (equal to the width of the rectangle) and the length of the projection perpendicular to the shortest (equal to the length of the rectangle). See this blog post of mine for a detailed description of the algorithm that computes these lengths.

msr = measure(rect,[],'feret');
sz = msr(1).feret(2:3)

This returns sz = [162.7506, 215.0775].


Here is the Python equivalent of the code above (the exact same algorithm implementations are run). PyDIP, the Python bindings for the DIPlib library, is a more or less 1:1 mapping of the C++ library function names, and thus the syntax is a bit more verbose than that of the DIPimage toolbox I’m using above. Install with pip install diplib.

import diplib as dip

img = dip.ImageRead('no8BP.png')

seeds = img.Similar()
seeds.Fill(0)
rr = dip.CreateRadiusCoordinate(seeds.Sizes())
seeds[rr < 50] = 1
seeds[rr > 250] = 2

rect = dip.SeededWatershed(dip.Gauss(img), seeds)
dip.viewer.ShowModal(dip.Overlay(img, rect))

rect = dip.Opening(dip.FillHoles(rect), 35)
dip.viewer.ShowModal(dip.Overlay(img, rect-dip.BinaryErosion(rect, 1, 1)))

msr = dip.MeasurementTool.Measure(dip.Label(rect), features=['Feret'])
sz = (msr[1]['Feret'][1], msr[1]['Feret'][2])
print(sz)

You could likely implement this in OpenCV as well, but it might be a bit more involved. For example, the two Feret measures we compute here are equivalent to what is returned by OpenCV’s minAreaRect, and the seeded watershed is included in OpenCV’s watershed.

Answered By: Cris Luengo