How to find model no. in an image generated by OCR?

Question:

(Examples are changed but the idea is the same)

I’m trying find a SRD Model No. on a product label from a live camera feed.

Here’s a label example:

enter image description here

The conditions are:

  1. Different generations of the product have different structures of the information on the label.
  2. The SRD Model No. has a variable length, varies from generation to generation.
  3. The SRD Model No. can contain ither only numbers or numbers and letters, varies from generation to generation.

So the question is, is there a way to find a substring of a SRD Model No. in a string generated from OCR, other then hard coding all possible variations?

Asked By: art.less.code

||

Answers:

Use OCR to capture the product label to text. Search for the string SRD Model: then grab text 1 char after space to next whitespace and that is your SRD number.

Answered By: Angus Comber

Here is an example script following @Angus Comber’s suggestion:

import pytesseract
import numpy as np
import cv2
from cv2 import imread, cvtColor, COLOR_BGR2HSV as HSV, inRange, getStructuringElement, resize

from pytesseract import image_to_data, Output

def extract_SRD(filename):     
    img = cv2.imread(filename)
    img_copy = img.copy()
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img_blur = cv2.GaussianBlur(img_gray, (3,3),0)

    mydata = pytesseract.image_to_data(img_blur, output_type=Output.DICT, config='--psm 6')
    SRD = mydata['text'][mydata['text'].index('SRD')+2]

    return SRD

filename = 'wm3tG.png'

SRD  = extract_SRD(filename)

print(SRD)

This snippet returns:
5427G2

The important line here is SRD = mydata['text'][mydata['text'].index('SRD')+2]. This is where you define the logic used to retrieve the SRD code. In this example, I simply query the second string of characters after SRD, thus skipping the word "Model".

I would suggest fine-tuning this example to check whether a specific value in the output dictionary contains "SRD". Then you may simply look for the next string of characters:

  • if this next string contains "Model", then return the one after that
  • if not return that string of characters.
Answered By: Sheldon
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.