Identify borders and column contours of table that has no visible outline within an image

Question:

I have a set of images, each containing a table. Some images have the tables in them already aligned and the borders are drawn, it is not hard to identify the main table on those images using Canny edge detection. However, some images have their tables without any borders, so I am trying to identify the table in an image and plot its border’s contours as well as columns.

I am using openCV version 3.4 and the approach i’m generally taking is as follows:

  1. dilate the grayscale image to identify the text spots
  2. apply cv2.findContours function to get text’s bounding boxes.
  3. cluster the bounding boxes in case smaller tables were identified instead of the main table.
  4. try to draw the contours in hopes to identify the borders of the table.

This approach seems to work to a certain extent but the drawn contours are not at all accurate.

    img, contours, hierarchy = cv2.findContours(gray_matrix, cv2.RETR_LIST, 
    cv2.CHAIN_APPROX_SIMPLE)

    # get bounding boxes around any text
    boxes = []
    for contour in contours:
        box = cv2.boundingRect(contour)
        h = box[3]

    rows = {}
    cols = {}

    # Clustering the bounding boxes by their positions
    for box in boxes:
        (x, y, w, h) = box
        col_key = 10 # cell threshold
        row_key = 10 # cell threshold
        cols[row_key] = [box] if col_key not in cols else cols[col_key] + [box]
        rows[row_key] = [box] if row_key not in rows else rows[row_key] + [box]

    # Filtering out the clusters having less than 4 cols
    table_cells = list(filter(lambda r: len(r) >= 4, rows.values()))
    # Sorting the row cells by x coord
    table_cells = [list(sorted(tb)) for tb in table_cells]

    table_cells = list(sorted(table_cells, key=lambda r: r[0][1]))

    #attempt to identify columns

    max_last_col_width_row = max(table_cells, key=lambda b: b[-1][2])
    max_x = max_last_col_width_row[-1][0] + max_last_col_width_row[-1][2]

    ver_lines = []

    for box in table_cells:
        x = box[0][0]
        y = box[0][1]
        hor_lines.append((x, y, max_x, y))

    for box in table_cells[0]:
        x = box[0]
        y = box[1]
        ver_lines.append((x, y, x, max_y))

    (x, y, w, h) = table_cells[0][-1]
    ver_lines.append((max_x, y, max_x, max_y))
    (x, y, w, h) = table_cells[0][0]
    hor_lines.append((x, max_y, max_x, max_y))

    for line in ver_lines:
        [x1, y1, x2, y2] = line
    cv2.line(output_image, (x1, y1), (x2, y2), (0, 0, 255), 1)

    cv2.imshow('Proper Table Borders', output_image)

I am trying to achieve something like the below image.

a busy cat

In short, how can I find the invisible borders of a table-structure in an image as well as identify the x coordinates of the identified table’s columns?

I know the above code is not at all optimal to produce the required outcome, but I am still learning openCV so I’m trying various approaches but still did not reach the desired result.

Asked By: ramez

||

Answers:

Try vertical profile, which is count of text (black) pixels with the same X coordinate in certain (Y0, Y1) range (table vertical span). Zero or near zero regions will indicate table column borders. Here is a hand drawn, approximate profile for your example:

enter image description here

Answered By: Paul Jurczak