extract table from image to another image by python 3

Question:

i want to extract table from image png and to save this table in another image.
i have this image :

enter image description here

I would like to find two images continents tables :

## first image :

enter image description here
**

second image :

**

enter image description here

how can I solve this?
Thanks in advance.

Asked By: Data Scientist

||

Answers:

Since the question is tagged with python and opencv, I assume you want a solution using this pipeline. Please have a look at the following solution. Disclaimer: I’m new to Python in general, and specially to the Python API of OpenCV (C++ for the win). Comments, improvements, highlighting Python no-gos are highly welcome!

import cv2

# Read input image
img = cv2.imread('images/B81om.png', cv2.IMREAD_COLOR)

# Convert to gray scale image
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

# Simple threshold
_, thr = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)

# Morphological closing to improve mask
close = cv2.morphologyEx(255 - thr, cv2.MORPH_CLOSE, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3)))

# Find only outer contours
contours, _ = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

# Save images for large enough contours
areaThr = 3000
i = 0
for cnt in contours:
    area = cv2.contourArea(cnt)
    if (area > areaThr):
        i = i + 1
        x, y, width, height = cv2.boundingRect(cnt)
        cv2.imwrite('output' + str(i) + '.png', img[y:y+height-1, x:x+width-1])

For the given example image, I get the following two output images:

Small table

Large table

Answered By: HansHirse

I can offer a slightly simpler option, which doesn’t involve using a heuristic to localize the tables. You can use Amazon Textract and its Tables detection feature. Note that this is a paid API.

You can use it using the pip install amazon-textract-textractor package.

from textractor import Textractor
from textractor.data.constants import TextractFeatures
extractor = Textractor(profile_name="default")
document = extractor.analyze_document(
    file_source="./B81om.png",
    features=[TextractFeatures.TABLES]
)
document.visualize()

enter image description here

We can see that the API detects correctly the two tables. Now it is simply a matter of iterating through them and extracting the coordinate to be able to crop the right pixel out of the original image. We can use openCV for that.

import cv2
img = cv2.imread('B81om.png')
table_images = []
h, w, c = img.shape
for i, table in enumerate(document.tables):
    table_image = img[int(table.bbox.y*h):int((table.bbox.y+table.bbox.height)*h), 
                      int(table.bbox.x*w):int((table.bbox.x+table.bbox.width)*w), 
                      :]
    cv2.imwrite(f'table_{i}.png', table_image)

The two tables are correctly extracted.

enter image description here

enter image description here


We can also extract the content of the table in a pandas data frame (though it has been redacted in the samples).

for i, table in enumerate(document.tables):
    print(table.to_pandas())
          0      1            2         3            4    5 6
0  Quantité  Unité  Désignation  P.u. TTC  Montant TTC  Tva  
1                                                         X  
2                              
                              
     0        1     2         3        4
0  Tva  Libellé  Taux  Base H.T  Montant
1    X  
Answered By: Thomas