extract table from image to another image by python 3

Question

i want to extract table from image png and to save this table in another image.
i have this image :

I would like to find two images continents tables :

## first image :

**

second image :

**

how can I solve this?
Thanks in advance.

Asked By: Data Scientist

||

Source

Answer 1

Since the question is tagged with python and opencv, I assume you want a solution using this pipeline. Please have a look at the following solution. Disclaimer: I’m new to Python in general, and specially to the Python API of OpenCV (C++ for the win). Comments, improvements, highlighting Python no-gos are highly welcome!

import cv2

# Read input image
img = cv2.imread('images/B81om.png', cv2.IMREAD_COLOR)

# Convert to gray scale image
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

# Simple threshold
_, thr = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)

# Morphological closing to improve mask
close = cv2.morphologyEx(255 - thr, cv2.MORPH_CLOSE, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3)))

# Find only outer contours
contours, _ = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

# Save images for large enough contours
areaThr = 3000
i = 0
for cnt in contours:
    area = cv2.contourArea(cnt)
    if (area > areaThr):
        i = i + 1
        x, y, width, height = cv2.boundingRect(cnt)
        cv2.imwrite('output' + str(i) + '.png', img[y:y+height-1, x:x+width-1])

For the given example image, I get the following two output images:

Answered By: HansHirse

Answer 2

I can offer a slightly simpler option, which doesn’t involve using a heuristic to localize the tables. You can use Amazon Textract and its Tables detection feature. Note that this is a paid API.

You can use it using the pip install amazon-textract-textractor package.

from textractor import Textractor
from textractor.data.constants import TextractFeatures
extractor = Textractor(profile_name="default")
document = extractor.analyze_document(
    file_source="./B81om.png",
    features=[TextractFeatures.TABLES]
)
document.visualize()

We can see that the API detects correctly the two tables. Now it is simply a matter of iterating through them and extracting the coordinate to be able to crop the right pixel out of the original image. We can use openCV for that.

import cv2
img = cv2.imread('B81om.png')
table_images = []
h, w, c = img.shape
for i, table in enumerate(document.tables):
    table_image = img[int(table.bbox.y*h):int((table.bbox.y+table.bbox.height)*h), 
                      int(table.bbox.x*w):int((table.bbox.x+table.bbox.width)*w), 
                      :]
    cv2.imwrite(f'table_{i}.png', table_image)

The two tables are correctly extracted.

We can also extract the content of the table in a pandas data frame (though it has been redacted in the samples).

for i, table in enumerate(document.tables):
    print(table.to_pandas())

          0      1            2         3            4    5 6
0  Quantité  Unité  Désignation  P.u. TTC  Montant TTC  Tva  
1                                                         X  
2                              
                              
     0        1     2         3        4
0  Tva  Libellé  Taux  Base H.T  Montant
1    X

Answered By: Thomas

extract table from image to another image by python 3

Question:

## first image :

second image :

Answers: