Sorting contours based on precedence in Python, OpenCV

Question:

I am trying to sort contours based on their arrivals, left-to-right and top-to-bottom just like how you write anything. From, top and left and then whichever comes accordingly.

This is what and how I have achieved up to now:

def get_contour_precedence(contour, cols):
    tolerance_factor = 61
    origin = cv2.boundingRect(contour)
    return ((origin[1] // tolerance_factor) * tolerance_factor) * cols + origin[0]


image = cv2.imread("C:/Users/XXXX/PycharmProjects/OCR/raw_dataset/23.png", 0)

ret, thresh1 = cv2.threshold(image, 130, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

contours, h = cv2.findContours(thresh1.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# perform edge detection, find contours in the edge map, and sort the
# resulting contours from left-to-right
contours.sort(key=lambda x: get_contour_precedence(x, thresh1.shape[1]))

# initialize the list of contour bounding boxes and associated
# characters that we'll be OCR'ing
chars = []
inc = 0
# loop over the contours
for c in contours:
    inc += 1

    # compute the bounding box of the contour
    (x, y, w, h) = cv2.boundingRect(c)

    label = str(inc)
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cv2.putText(image, label, (x - 2, y - 2),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    print('x=', x)
    print('y=', y)
    print('x+w=', x + w)
    print('y+h=', y + h)
    crop_img = image[y + 2:y + h - 1, x + 2:x + w - 1]
    name = os.path.join("bounding boxes", 'Image_%d.png' % (
        inc))
    cv2.imshow("cropped", crop_img)
    print(name)
    crop_img = Image.fromarray(crop_img)
    crop_img.save(name)
    cv2.waitKey(0)

cv2.imshow('mat', image)
cv2.waitKey(0)

Input Image :

Input image

Output Image 1:

Output Image 1(WOrking)

Input Image 2 :

Image Input 2

Output for Image 2:

Output Image 2

Input Image 3:

Input Image 3

Output Image 3:

Output Image 3

As you can see the 1,2,3,4 is not what I was expecting it to be each image, as displayed in the Image Number 3.

How do I adjust this to make it work or even write a custom function?

NOTE: I have multiple images of the same input image provided in my question. The content is the same but they have variations in the text so the tolerance factor is not working for each one of them. Manually adjusting it would not be a good idea.

Asked By: Jimit Vaghela

||

Answers:

Instead of taking the upper left corner of the contour, I’d rather use the centroid or at least the bounding box center.

def get_contour_precedence(contour, cols):
tolerance_factor = 4
origin = cv2.boundingRect(contour)
return (((origin[1] + origin[3])/2 // tolerance_factor) * tolerance_factor) * cols + (origin[0] + origin[2]) / 2

But it might be hard to find a tolerance value that works in all cases.

Answered By: antoine

I would even say use hue moments which tends to be a better estimation for the center point of a polygon
than the "normal" coordinate center point of the rectangle, so the function could be:

def get_contour_precedence(contour, cols):
     tolerance_factor = 61
     M = cv2.moments(contour)
     # calculate x,y coordinate of centroid
     if M["m00"] != 0:
             cX = int(M["m10"] / M["m00"])
             cY = int(M["m01"] / M["m00"])
     else:
     # set values as what you need in the situation
             cX, cY = 0, 0
     return ((cY // tolerance_factor) * tolerance_factor) * cols + cX

an super math. explanation what hue moments are, could you find here

Maybe you should think about get rid of this tolerance_factor
by using in general a clustering algorithm like
kmeans to cluster your center to rows and columns.
OpenCv has a an kmeans implementation which you could find here

I do not exactly know what your goal is, but another idea could be to split every line into an Region of Interest (ROI)
for further processing, afterwards you could easily count the letters
by the X-Values of the each contour and the line number

import cv2
import numpy as np

## (1) read
img = cv2.imread("yFX3M.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

## (2) threshold
th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)

## (3) minAreaRect on the nozeros
pts = cv2.findNonZero(threshed)
ret = cv2.minAreaRect(pts)

(cx,cy), (w,h), ang = ret
if w>h:
    w,h = h,w

## (4) Find rotated matrix, do rotation
M = cv2.getRotationMatrix2D((cx,cy), ang, 1.0)
rotated = cv2.warpAffine(threshed, M, (img.shape[1], img.shape[0]))

## (5) find and draw the upper and lower boundary of each lines
hist = cv2.reduce(rotated,1, cv2.REDUCE_AVG).reshape(-1)

th = 2
H,W = img.shape[:2]
#   (6) using histogramm with threshold
uppers = [y for y in range(H-1) if hist[y]<=th and hist[y+1]>th]
lowers = [y for y in range(H-1) if hist[y]>th and hist[y+1]<=th]

rotated = cv2.cvtColor(rotated, cv2.COLOR_GRAY2BGR)
for y in uppers:
    cv2.line(rotated, (0,y), (W, y), (255,0,0), 1)

for y in lowers:
    cv2.line(rotated, (0,y), (W, y), (0,255,0), 1)
cv2.imshow('pic', rotated)

# (7) we iterate all rois and count 
for i in range(len(uppers)) : 
    print('line=',i)
    roi = rotated[uppers[i]:lowers[i],0:W]
    cv2.imshow('line', roi)
    cv2.waitKey(0)
    # here again calc thres and contours

I found an old post with this code here

Answered By: t2solve

Here is one way in Python/OpenCV by processing by rows first then characters.

  • Read the input
  • Convert to grayscale
  • Threshold and invert
  • Use a long horizontal kernels and apply morphology close to form rows
  • Get the contours of the rows and their bounding boxes
  • Save the row boxes and sort on Y
  • Loop over each sorted row box and extract the row from the thresholded image
  • Get the contours of each character in the row and save the the bounding boxes of the characters.
  • Sort the contours for a given row on X
  • Draw the bounding boxes on the input and the index number as text on the image
  • Increment the index
  • Save the results

Input:

enter image description here

import cv2
import numpy as np

# read input image
img = cv2.imread('vision78.png')

# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# otsu threshold
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1]
thresh = 255 - thresh 

# apply morphology close to form rows
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (51,1))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)

# find contours and bounding boxes of rows
rows_img = img.copy()
boxes_img = img.copy()
rowboxes = []
rowcontours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
rowcontours = rowcontours[0] if len(rowcontours) == 2 else rowcontours[1]
index = 1
for rowcntr in rowcontours:
    xr,yr,wr,hr = cv2.boundingRect(rowcntr)
    cv2.rectangle(rows_img, (xr, yr), (xr+wr, yr+hr), (0, 0, 255), 1)
    rowboxes.append((xr,yr,wr,hr))

# sort rowboxes on y coordinate
def takeSecond(elem):
    return elem[1]
rowboxes.sort(key=takeSecond)
    
# loop over each row    
for rowbox in rowboxes:
    # crop the image for a given row
    xr = rowbox[0]
    yr = rowbox[1]
    wr = rowbox[2]
    hr = rowbox[3]  
    row = thresh[yr:yr+hr, xr:xr+wr]
    bboxes = []
    # find contours of each character in the row
    contours = cv2.findContours(row, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = contours[0] if len(contours) == 2 else contours[1]
    for cntr in contours:
        x,y,w,h = cv2.boundingRect(cntr)
        bboxes.append((x+xr,y+yr,w,h))
    # sort bboxes on x coordinate
    def takeFirst(elem):
        return elem[0]
    bboxes.sort(key=takeFirst)
    # draw sorted boxes
    for box in bboxes:
        xb = box[0]
        yb = box[1]
        wb = box[2]
        hb = box[3]
        cv2.rectangle(boxes_img, (xb, yb), (xb+wb, yb+hb), (0, 0, 255), 1)
        cv2.putText(boxes_img, str(index), (xb,yb), cv2.FONT_HERSHEY_COMPLEX_SMALL, 0.75, (0,255,0), 1)
        index = index + 1
    
# save result
cv2.imwrite("vision78_thresh.jpg", thresh)
cv2.imwrite("vision78_morph.jpg", morph)
cv2.imwrite("vision78_rows.jpg", rows_img)
cv2.imwrite("vision78_boxes.jpg", boxes_img)

# show images
cv2.imshow("thresh", thresh)
cv2.imshow("morph", morph)
cv2.imshow("rows_img", rows_img)
cv2.imshow("boxes_img", boxes_img)
cv2.waitKey(0)

Threshold image:

enter image description here

Morphology image of rows:

enter image description here

Row contours image:

enter image description here

Character contours image:

enter image description here

Answered By: fmw42

This is my take on the problem. I’ll give you the general gist of it, and then my implementation in C++. The main idea is that I want to process the image from left to right, top to bottom. I’ll process each blob (or contour) as I find it, however, I need a couple of intermediate steps for achieving a successful (an ordered) segmentation.

Vertical sort using rows

The first step is trying to sort the blobs by rows – this means that each row has a set of (unordered) horizontal blobs. That’s ok. the first step is computing some kind of vertical sorting, and if we process each row from top to bottom, we will achieve just that.

After the blobs are (vertically) sorted by rows, then I can check out their centroids (or center of mass) and horizontally sort them. The idea is that I will process row per row and, for each row, I sort blob centroids. Let’s see an example of what I’m trying to achieve here.

This is your input image:

This is what I call the Row Mask:

This last image contains white areas that represent a "row" each. Each row has a number (e.g., Row1 , Row2, etc.) and each row holds a set of blobs (or characters, in this case). By processing each row, top from bottom, you are already sorting the blobs on the vertical axis.

If I number each row from top to bottom, I get this image:

The Row Mask is a way of creating "rows of blobs", and this mask can be computed morphologically. Check out the 2 images overlaid to give you a better view of the processing order:

What we are trying to do here is, first, a vertical ordering (blue arrow) and then we will take care of the horizontal (red arrow) ordering. You can see that by processing each row we can (possibly) overcome the sorting problem!

Horizontal sort using centroids

Let’s see now how we can sort the blobs horizontally. If we create a simpler image, with a width equal to the input image and a height equal to the numbers of rows in our Row Mask, we can simply overlay every horizontal coordinate (x coordinate) of each blob centroid. Check out this example:

This is a Row Table. Each row represents the number of rows found in the Row Mask, and is also read from top to bottom. The width of the table is the same as the width of your input image, and corresponds spatially to the horizontal axis. Each square is a pixel in your input image, mapped to the Row Table using only the horizontal coordinate (as our simplification of rows is pretty straightforward). The actual value of each pixel in the row table is a label, labeling each of the blobs on your input image. Note that the labels are not ordered!

So, for instance, this table shows that, in the row 1 (you already know what is row 1 – it’s the first white area on the Row Mask) in the position (1,4) there’s the blob number 3. In position (1,6) there’s blob number 2, and so on. What’s cool (I think) about this table is that you can loop through it, and for every value different of 0, horizontal ordering becomes very trivial. This is the row table ordered, now, left to right:

Mapping blob information with centroids

We are going to use blobs centroids to map the information between our two representations (Row Mask/Row Table). Suppose you already have both "helper" images and you process each blob (or contour) on the input image at a time. For example, you have this as a start:

Alright, there’s a blob here. How can we map it to the Row Mask and to the Row Table? Using its centroids. If we compute the centroid (shown in the figure as the green dot) we can construct a dictionary of centroids and labels. For example, for this blob, the centroid is located at (271,193). Ok, let’s assign the label = 1. So we now have this dictionary:

Now, we find the row in which this blob is placed using the same centroid on the Row Mask. Something like this:

rowNumber = rowMask.at( 271,193 )

This operation should return rownNumber = 3. Nice! We know in which row our blob is placed on, and so, it is now vertically ordered. Now, let’s store its horizontal coordinate in the Row Table:

rowTable.at( 271, 193 ) = 1

Now, rowTable holds (in its row and column) the label of the processed blob. The Row Table should look something like this:

The table is a lot wider, because its horizontal dimension has to be the same as your input image. In this image, the label 1 is placed in Column 271, Row 3. If this was the only blob on your image, the blobs would be already sorted. But what happens if you add another blob in, say, Column 2, Row 1? That’s why you need to traverse, again, this table after you have processed all the blobs – to properly correct their label.

Implementation in C++

Alright, hopefully the algorithm should be a little bit clear (if not, just ask, my man). I’ll try to implement these ideas in OpenCV using C++. First, I need a binary image of your input. Computation is trivial using Otsu’s thresholding method:

//Read the input image:
std::string imageName = "C://opencvImages//yFX3M.png";
cv::Mat testImage = cv::imread( imageName );

//Compute grayscale image
cv::Mat grayImage;
cv::cvtColor( testImage, grayImage, cv::COLOR_RGB2GRAY );

//Get binary image via Otsu:
cv::Mat binImage;
cv::threshold( grayImage, binImage, 0, 255, cv::THRESH_OTSU );

//Invert image:
binImage = 255 - binImage;

This is the resulting binary image, nothing fancy, just what we need to start working:

The first step is to get the Row Mask. This can be achieved using morphology. Just apply a dilation + erosion with a VERY big horizontal structuring element. The idea is you want to turn those blobs into rectangles, "fusing" them together horizontally:

//Create a hard copy of the binary mask:
cv::Mat rowMask = binImage.clone();

//horizontal dilation + erosion:
int horizontalSize = 100; // a very big horizontal structuring element
cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(horizontalSize,1) );
cv::morphologyEx( rowMask, rowMask, cv::MORPH_DILATE, SE, cv::Point(-1,-1), 2 );
cv::morphologyEx( rowMask, rowMask, cv::MORPH_ERODE, SE, cv::Point(-1,-1), 1 );

This results in the following Row Mask:

That’s very cool, now that we have our Row Mask, we must number them rows, ok? There’s a lot of ways of doing this, but right now I’m interested in the simpler one: loop through this image and get every single pixel. If a pixel is white, use a Flood Fill operation to label that portion of the image as a unique blob (or row, in this case). This can be done as follows:

//Label the row mask:
int rowCount = 0; //This will count our rows

//Loop thru the mask:
for( int y = 0; y < rowMask.rows; y++ ){
    for( int x = 0; x < rowMask.cols; x++ ){
        //Get the current pixel:
        uchar currentPixel = rowMask.at<uchar>( y, x );
        //If the pixel is white, this is an unlabeled blob:
        if ( currentPixel == 255 ) {
            //Create new label (different from zero):
            rowCount++;
            //Flood fill on this point:
            cv::floodFill( rowMask, cv::Point( x, y ), rowCount, (cv::Rect*)0, cv::Scalar(), 0 );
        }
    }
}

This process will label all the rows from 1 to r. That’s what we wanted. If you check out the image you’ll faintly see the rows, that’s because our labels correspond to very low intensity values of grayscale pixels.

Ok, now let’s prepare the Row Table. This "table" really is just another image, remember: same width as the input and height as the number of rows you counted on the Row Mask:

//create rows image:
cv::Mat rowTable = cv::Mat::zeros( cv::Size(binImage.cols, rowCount), CV_8UC1 );
//Just for convenience:
rowTable = 255 - rowTable;

Here, I just inverted the final image for convenience. Because I want to actually see how the table is populated with (very low intensity) pixels and be sure that everything is working as intended.

Now comes the fun part. We have both images (or data containers) prepared. We need to process each blob independently. The idea is that you have to extract each blob/contour/character from the binary image and compute its centroid and assign a new label. Again, there’s a lot of way of doing this. Here, I’m using the following approach:

I’ll loop through the binary mask. I’ll get the current biggest blob from this binary input. I’ll compute its centroid and store its data in every container needed, and then, I’ll delete that blob from the mask. I’ll repeat the process until no more blobs are left. This is my way of doing this, especially because I’ve functions I already wrote for that. This is the approach:

//Prepare a couple of dictionaries for data storing:
std::map< int, cv::Point > blobMap; //holds label, gives centroid
std::map< int, cv::Rect > boundingBoxMap; //holds label, gives bounding box

First, two dictionaries. One receives a blob label and returns the centroid. The other one receives the same label and returns the bounding box.

//Extract each individual blob:
cv::Mat bobFilterInput = binImage.clone();

//The new blob label:
int blobLabel = 0;

//Some control variables:
bool extractBlobs = true; //Controls loop
int currentBlob = 0; //Counter of blobs

while ( extractBlobs ){

    //Get the biggest blob:
    cv::Mat biggestBlob = findBiggestBlob( bobFilterInput );

    //Compute the centroid/center of mass:
    cv::Moments momentStructure = cv::moments( biggestBlob, true );
    float cx = momentStructure.m10 / momentStructure.m00;
    float cy = momentStructure.m01 / momentStructure.m00;

    //Centroid point:
    cv::Point blobCentroid;
    blobCentroid.x = cx;
    blobCentroid.y = cy;

    //Compute bounding box:
    boundingBox boxData;
    computeBoundingBox( biggestBlob, boxData );

    //Convert boundingBox data into opencv rect data:
    cv::Rect cropBox = boundingBox2Rect( boxData );


    //Label blob:
    blobLabel++;
    blobMap.emplace( blobLabel, blobCentroid );
    boundingBoxMap.emplace( blobLabel, cropBox );

    //Get the row for this centroid
    int blobRow = rowMask.at<uchar>( cy, cx );
    blobRow--;

    //Place centroid on rowed image:
    rowTable.at<uchar>( blobRow, cx ) = blobLabel;

    //Resume blob flow control:
    cv::Mat blobDifference = bobFilterInput - biggestBlob;
    //How many pixels are left on the new mask?
    int pixelsLeft = cv::countNonZero( blobDifference );
    bobFilterInput = blobDifference;

    //Done extracting blobs?
    if ( pixelsLeft <= 0 ){
        extractBlobs = false;
    }

    //Increment blob counter:
    currentBlob++;

}

Check out a nice animation of how this processing goes through each blob, processes it and deletes it until there’s nothing left:

Now, some notes with the above snippet. I’ve some helper functions: biggestBlob and computeBoundingBox. These functions compute the biggest blob in a binary image and convert a custom structure of a bounding box into OpenCV’s Rect structure respectively. Those are the operations those functions carry out.

The "meat" of the snippet is this: Once you have an isolated blob, compute its centroid (I actually compute the center of mass via central moments). Generate a new label. Store this label and centroid in a dictionary, in my case, the blobMap dictionary. Additionally compute the bounding box and store it in another dictionary, boundingBoxMap:

//Label blob:
blobLabel++;
blobMap.emplace( blobLabel, blobCentroid );
boundingBoxMap.emplace( blobLabel, cropBox );

Now, using the centroid data, fetch the corresponding row of that blob. Once you get the row, store this number into your row table:

//Get the row for this centroid
int blobRow = rowMask.at<uchar>( cy, cx );
blobRow--;

//Place centroid on rowed image:
rowTable.at<uchar>( blobRow, cx ) = blobLabel;

Excellent. At this point you have the Row Table ready. Let’s loop through it and actually, and finally, order those damn blobs:

int blobCounter = 1; //The ORDERED label, starting at 1
for( int y = 0; y < rowTable.rows; y++ ){
    for( int x = 0; x < rowTable.cols; x++ ){
        //Get current label:
        uchar currentLabel = rowTable.at<uchar>( y, x );
        //Is it a valid label?
        if ( currentLabel != 255 ){
            //Get the bounding box for this label:
            cv::Rect currentBoundingBox = boundingBoxMap[ currentLabel ];
            cv::rectangle( testImage, currentBoundingBox, cv::Scalar(0,255,0), 2, 8, 0 );
            //The blob counter to string:
            std::string counterString = std::to_string( blobCounter );
            cv::putText( testImage, counterString, cv::Point( currentBoundingBox.x, currentBoundingBox.y-1 ),
                         cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(255,0,0), 1, cv::LINE_8, false );
            blobCounter++; //Increment the blob/label
        }
    }
}

Nothing fancy, just a regular nested for loop, looping through each pixel on the row table. If the pixel is different from white, use the label to retrieve both the centroid and bounding box, and just change the label to an increasing number. For result displaying I just draw the bounding boxes and the new label on the original image.

Check out the ordered processing in this animation:

Very cool, here’s a bonus animation, the Row Table getting populated with horizontal coordinates:

Answered By: stateMachine
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.