Having a hard time reading a text from png file using python

Question:

image

I’m having a hard time extracting the text CHUBB from this image above. I tried several image preprocessing techniques and pytesseract to extract them without success.

My Output: x0c

Expected output: ‘CHUBB’

Any help would be appreciated

My attempt:

import pytesseract
img = cv2.imread('image1_1.png')

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 

thresh1 = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 

                                          cv2.THRESH_BINARY, 199, 5)

cv2.imshow('Adaptive Mean', thresh1)

# De-allocate any associated memory usage   

if cv2.waitKey(0) & 0xff == 27:  

    cv2.destroyAllWindows()
    
# Adding custom options
custom_config = r' --psm 3'
pytesseract.image_to_string(thresh1, config=custom_config)```
Asked By: masterp

||

Answers:

I think the problem is that the text CHUBB is too large for the picture.
If we decrease the size a little bit or paste it into a larger canvas, then pytesseract will work fine

from PIL import Image
img = Image.open('test.png')  # load image
new_img = Image.new('RGB', (400, 400), color = 'white')  # create a larger canvas
new_img.paste(im=img, box=(100,100), mask=img)  # paste original CHUBB in the large image
text = pytesseract.image_to_string(new_img, lang='eng', config='--psm 12')  # OCR
print(text)  # CHUBB

FYI

for i in range(1,14):
    try:
        text = pytesseract.image_to_string(new_img, lang='eng',config=f"--psm {i}")  # OCR
        print('psm',i, text)
    except:
        pass

Yield

psm 1 CHUBB
 
psm 3 CHUBB
 
psm 4 CHUBB
 
psm 5 0
u
J
I
U
 
psm 6 CHUBB
 
psm 7 CHUBB
 
psm 8 7
 
psm 9 CHUBB
 
psm 10 CHUBB
 
psm 11 CHUBB
 
psm 12 CHUBB
 
psm 13 7
Answered By: Yu Kuo