tesseract | py4u

How to use tessedit_write_images with pytesseract?

How to use tessedit_write_images with pytesseract? Question: I’m using pytesseract 0.3.10 with tesseract 5.3.0. I want to take a look at how tesseract processed my images. I tried setting tessedit_write_images to true via: import pytesseract as pt pt.image_to_string(crop_img, lang=’eng+deu+fra+spa’, config="–psm 6 -c tessedit_write_images=1") But this is not working. The tessinput.tif file is nowhere to be …

Total answers: 1

Script doesn't execute when wrapped inside of a function

Script doesn't execute when wrapped inside of a function Question: When I execute the script below with python3 ocr-test.py, it runs correctly: from PIL import Image import pytesseract # If you don’t have tesseract executable in your PATH, include the following: pytesseract.pytesseract.tesseract_cmd = r’/opt/homebrew/bin/tesseract’ # Simple image to string print(pytesseract.image_to_string(Image.open(‘receipt1.jpg’))) However, when I excute the …

Total answers: 2

Python Pytesseract not detecting strings on image

Python Pytesseract not detecting strings on image Question: Hi I have a python code with tesseract, the goal is to detect strings from screenshot. Code: import pytesseract import cv2 import pyautogui import numpy as np pytesseract.pytesseract.tesseract_cmd = r’C:Program FilesTesseract-OCRtesseract.exe’ image = pyautogui.screenshot() image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR) cv2.imwrite("imagesgameScreenshot.png", image) img = cv2.imread(‘imagesgameScreenshot.png’) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) …

Total answers: 1

Tesseract OCR accents problems, image enhancement not enough

Tesseract OCR accents problems, image enhancement not enough Question: I really need your help with Tesseract. I’m using Tesseract and pdf2image to extract informations from a scanned PDF file. My problem is that Tesseract messes with the accents é, è et ê (i’m french) and with the lowercase "i" and upcase "I". I tried processing …

Total answers: 1

Unable to read captcha text with python tesseract and OpenCV

Unable to read image text with python tesseract and OpenCV Question: I am trying read text from this using Python with OpenCV. However, it is not able to read it. import cv2 as cv import numpy as np from matplotlib import pyplot as plt img=cv.imread(file_path,0) img = cv.medianBlur(img,5) ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY) th2 =cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C, cv.THRESH_BINARY,11,2) th3 …

Total answers: 1

Removing black background/black stray straight lines from a captcha in python

Removing black background/black stray straight lines from a captcha in python Question: I am trying read text from this image] using Python with OpenCV. However, black background in corners if this pic is messing with the text output and is giving wrong text. I tried to used Adaptive Gaussian Thresholding in OpenCV using code: import …

Total answers: 2

How can I read text on the screen presented as an image using sikulix IDE?

How can I read text on the screen presented as an image using sikulix IDE? Question: I’m using sikulix IDE version 2.0.5 in windows 10 and the usage so far is successful. I want to read a specific single line text on the screen using sikulix IDE. I can’t copy the text to the clipboard …

Total answers: 1

How to get data from chart image while preserving order?

How to get data from chart image while preserving order? Question: I have few images like these, Image 1: Image 2: I can extract the names and roles from these images using an ocr tool like tesseract from Python, but I want to preserve the hierarchy along the way. Please provide some interesting ways to …

Total answers: 1

How to remove noise around numbers using OpenCV

How to remove noise around numbers using OpenCV Question: I’m trying to use Tesseract-OCR to get the readings on below images but having issues getting consistent results with the spotted background. I have below configuration on my pytesseract CONFIG = f"—psm 6 -c tessedit_char_whitelist=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄabcdefghijklmnopqrstuvwxyzåäö.,-" I have also tried below image pre-processing with some good results, …

Total answers: 2

Use Tesseract OCR to extract text from a scanned pdf folders

Use Tesseract OCR to extract text from a scanned pdf folders Question: I have the code to extract/convert text from scanned pdf files/normal pdf files by using Tesseract OCR. But I want to make my code to convert a pdf folder rather than a single pdf file, then the extract text files will be store …

Total answers: 1