How to extract contour of the front panel the washing machine?
Question:
Answers:
Three potential options:
-
Get a bunch of images of the machines, manually determine a label saying where the door is, and then train a convolutional neural network to regress those parameters per image.
-
Treat each image as a separate optimization problem, where the goal is to estimate the parameters of the best rectangle most likely to correspond to the front panel. So our model is theta = (p_1, p_2, p_3, p_4)
, the four 2D locations of the panel in the image. We need an energy function E
to minimize wrt theta
(e.g., using gradient descent with momentum, or RANSAC). There are a number of terms you can use, just as some ideas:
a. At least some of the corners should be “corner-like”: run a simple corner detector, and define an energy E_corner
which penalizes distance to the closest corner.
b. At least some of the edges (between p_1
and p_2
or p_3
, for example) should be “edge-like”: compute the gradient magnitude of the image M = || nabla I ||
and enforce that along the panel edge the values of M
should be larger, using an energy E_edge
. E.g., for x,y
along an edge, let E_edge(x,y)=1/(1+M(x,y))
(Robust losses tend to be better here though).
c. Use the fact that each door is actually a projected 3D rectangle: e.g., see this question. An interesting idea is to start with a rectangle (representing the panel) and instead of regressing the p_i
‘s, instead regress the parameters of an affine transform or even perspective projection transform (though this requires the algorithm estimate depth), that maps the starting rectangle to one in the image. You can then regularize the parameters of the estimated transform to prevent unlikely transforms from being output.
d. Use knowledge of what must be inside the rectangle. For instance, given the four corners, you can determine the ellipse defining the round door to the machine. The appearance statistics within that ellipse should be somewhat unique, as well as the edges/image gradient at the door boundary; hence you can define an energy term encouraging the model to choose corners such that the interior has a dark elliptical object on a white background.
Overall, this approach is similar to snakes, or active contour models, which might be worth looking into for you I think. However, energy-minimizing snakes tend not to consider the inside of the region they enclose; hence, some variant of the Mumford-Shah functional could be a useful addition (though note smoothness of the “door region” is not entirely desirable in your case).
-
If all your machines are very similar or nearly the same (as the ones you’ve posted are), it might actually be best to estimate a homography between the images. (See also here or here). Since the front of the machine is nearly planar, the fronts of different images must be related by a homography. Then knowing where the front panel is in one image will tell you where it is in all of them. For instance, check out the OpenCV tutorial for homographies, where they show how to undo the perspective transform of a planar surface allowing you to do a perspective warp of one image to another (here, one projected machine panel to another template one).
Three potential options:
-
Get a bunch of images of the machines, manually determine a label saying where the door is, and then train a convolutional neural network to regress those parameters per image.
-
Treat each image as a separate optimization problem, where the goal is to estimate the parameters of the best rectangle most likely to correspond to the front panel. So our model is
theta = (p_1, p_2, p_3, p_4)
, the four 2D locations of the panel in the image. We need an energy functionE
to minimize wrttheta
(e.g., using gradient descent with momentum, or RANSAC). There are a number of terms you can use, just as some ideas:a. At least some of the corners should be “corner-like”: run a simple corner detector, and define an energy
E_corner
which penalizes distance to the closest corner.b. At least some of the edges (between
p_1
andp_2
orp_3
, for example) should be “edge-like”: compute the gradient magnitude of the imageM = || nabla I ||
and enforce that along the panel edge the values ofM
should be larger, using an energyE_edge
. E.g., forx,y
along an edge, letE_edge(x,y)=1/(1+M(x,y))
(Robust losses tend to be better here though).c. Use the fact that each door is actually a projected 3D rectangle: e.g., see this question. An interesting idea is to start with a rectangle (representing the panel) and instead of regressing the
p_i
‘s, instead regress the parameters of an affine transform or even perspective projection transform (though this requires the algorithm estimate depth), that maps the starting rectangle to one in the image. You can then regularize the parameters of the estimated transform to prevent unlikely transforms from being output.d. Use knowledge of what must be inside the rectangle. For instance, given the four corners, you can determine the ellipse defining the round door to the machine. The appearance statistics within that ellipse should be somewhat unique, as well as the edges/image gradient at the door boundary; hence you can define an energy term encouraging the model to choose corners such that the interior has a dark elliptical object on a white background.
Overall, this approach is similar to snakes, or active contour models, which might be worth looking into for you I think. However, energy-minimizing snakes tend not to consider the inside of the region they enclose; hence, some variant of the Mumford-Shah functional could be a useful addition (though note smoothness of the “door region” is not entirely desirable in your case).
-
If all your machines are very similar or nearly the same (as the ones you’ve posted are), it might actually be best to estimate a homography between the images. (See also here or here). Since the front of the machine is nearly planar, the fronts of different images must be related by a homography. Then knowing where the front panel is in one image will tell you where it is in all of them. For instance, check out the OpenCV tutorial for homographies, where they show how to undo the perspective transform of a planar surface allowing you to do a perspective warp of one image to another (here, one projected machine panel to another template one).