How to remove noise around numbers using OpenCV
Question:
I’m trying to use Tesseract-OCR to get the readings on below images but having issues getting consistent results with the spotted background. I have below configuration on my pytesseract
CONFIG = f"—psm 6 -c tessedit_char_whitelist=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄabcdefghijklmnopqrstuvwxyzåäö.,-"
I have also tried below image pre-processing with some good results, but still not perfect results
blur = cv2.blur(img,(4,4))
(T, threshInv) = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
What I want is to consistently be able to identify the numbers and the decimal separator. What image pre-processing could help in getting consistent results on images as below?
Answers:
You may find a solution using a slightly more complex approach by filtering in the frequency domain instead of the spatial domain. The thresholds might require some tweaking depending on how tesseract performs with the output images.
Implementation:
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('C:\Test\number.jpg', cv2.IMREAD_GRAYSCALE)
# Perform 2D FFT
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 20*np.log(np.abs(fshift))
# Squash all of the frequency magnitudes above a threshold
for idx, x in np.ndenumerate(magnitude_spectrum):
if x > 195:
fshift[idx] = 0
# Inverse FFT back into the real-spatial-domain
f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
img_back = np.real(img_back)
img_back = cv2.normalize(img_back, None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
out_img = np.copy(img)
# Use the inverted FFT image to keep only the black values below a threshold
for idx, x in np.ndenumerate(img_back):
if x < 100:
out_img[idx] = 0
else:
out_img[idx] = 255
plt.subplot(131),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img_back, cmap = 'gray')
plt.title('Reversed FFT'), plt.xticks([]), plt.yticks([])
plt.subplot(133),plt.imshow(out_img, cmap = 'gray')
plt.title('Output'), plt.xticks([]), plt.yticks([])
plt.show()
Output:
Median Blur Implementation:
import cv2
import numpy as np
img = cv2.imread('C:\Test\number.jpg', cv2.IMREAD_GRAYSCALE)
blur = cv2.medianBlur(img, 3)
for idx, x in np.ndenumerate(blur):
if x < 20:
blur[idx] = 0
cv2.imshow("Test", blur)
cv2.waitKey()
Output:
Final Edit:
So using Eumel’s solution and combining this bit of code on the bottom of it yields a 100% successful result:
img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255
# Eumel's code above this line
img = cv2.erode(img, np.ones((3,3)))
cv2.imwrite("out.png", img)
cv2.imshow("Test", img)
print(pytesseract.image_to_string(Image.open("out.png"), lang='eng', config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789.,'))
Output Image Examples:
Whitelisting the tesseract characters appears to help quite a bit as well to prevent false identification.
That was a challenge but i think i have an interesting approach: Pattern-matching
If you zoom in, you realize that the pattern in the back only has 4 possible dots, a single full pixle, a double full pixel and a double pixel with a medium left or right. So what i did was grab these 4 patterns from the image with 17.160.000,00 and got to work. Save these to load again, i just grabbed them on the fly
img = cv2.imread('C:/Users/***/17.jpg', cv2.IMREAD_GRAYSCALE)
pattern_1 = img[2:5,1:5]
pattern_2 = img[6:9,5:9]
pattern_3 = img[6:9,11:15]
pattern_4 = img[9:12,22:26]
# just to show it carries over to other pics ;)
img = cv2.imread('C:/Users/****/6.jpg', cv2.IMREAD_GRAYSCALE)
Actual Pattern Matching
Next we match all the patterns and threshold to find all occurrences, i used 0.7 but you can play around with it a little. These patterns take off some pixels on the side and only match a sigle pixel on the left so we pad twice (one with an extra) to hit both for the first 3 patterns. The last one is the single pixel so it doesnt need it
res_1 = cv2.matchTemplate(img,pattern_1,cv2.TM_CCOEFF_NORMED )
thresh_1 = cv2.threshold(res_1,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_1 = np.pad(thresh_1,((1,1),(1,2)),'constant')
pat_thresh_15 = np.pad(thresh_1,((1,1),(2,1)), 'constant')
res_2 = cv2.matchTemplate(img,pattern_2,cv2.TM_CCOEFF_NORMED )
thresh_2 = cv2.threshold(res_2,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_2 = np.pad(thresh_2,((1,1),(1,2)),'constant')
pat_thresh_25 = np.pad(thresh_2,((1,1),(2,1)), 'constant')
res_3 = cv2.matchTemplate(img,pattern_3,cv2.TM_CCOEFF_NORMED )
thresh_3 = cv2.threshold(res_3,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_3 = np.pad(thresh_3,((1,1),(1,2)),'constant')
pat_thresh_35 = np.pad(thresh_3,((1,1),(2,1)), 'constant')
res_4 = cv2.matchTemplate(img,pattern_4,cv2.TM_CCOEFF_NORMED )
thresh_4 = cv2.threshold(res_4,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_4 = np.pad(thresh_4,((1,1),(1,2)),'constant')
Editing the Image
Now the only thing left to do is remove all the matches from the image. Since we have a mostly white backround we just set them to 255 to blend in.
img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255
Output
Edit:
Take a look at Abstracts answer as well for refining this output and tesseract finetuning
I’m trying to use Tesseract-OCR to get the readings on below images but having issues getting consistent results with the spotted background. I have below configuration on my pytesseract
CONFIG = f"—psm 6 -c tessedit_char_whitelist=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄabcdefghijklmnopqrstuvwxyzåäö.,-"
I have also tried below image pre-processing with some good results, but still not perfect results
blur = cv2.blur(img,(4,4))
(T, threshInv) = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
What I want is to consistently be able to identify the numbers and the decimal separator. What image pre-processing could help in getting consistent results on images as below?
You may find a solution using a slightly more complex approach by filtering in the frequency domain instead of the spatial domain. The thresholds might require some tweaking depending on how tesseract performs with the output images.
Implementation:
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('C:\Test\number.jpg', cv2.IMREAD_GRAYSCALE)
# Perform 2D FFT
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 20*np.log(np.abs(fshift))
# Squash all of the frequency magnitudes above a threshold
for idx, x in np.ndenumerate(magnitude_spectrum):
if x > 195:
fshift[idx] = 0
# Inverse FFT back into the real-spatial-domain
f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
img_back = np.real(img_back)
img_back = cv2.normalize(img_back, None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
out_img = np.copy(img)
# Use the inverted FFT image to keep only the black values below a threshold
for idx, x in np.ndenumerate(img_back):
if x < 100:
out_img[idx] = 0
else:
out_img[idx] = 255
plt.subplot(131),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img_back, cmap = 'gray')
plt.title('Reversed FFT'), plt.xticks([]), plt.yticks([])
plt.subplot(133),plt.imshow(out_img, cmap = 'gray')
plt.title('Output'), plt.xticks([]), plt.yticks([])
plt.show()
Output:
Median Blur Implementation:
import cv2
import numpy as np
img = cv2.imread('C:\Test\number.jpg', cv2.IMREAD_GRAYSCALE)
blur = cv2.medianBlur(img, 3)
for idx, x in np.ndenumerate(blur):
if x < 20:
blur[idx] = 0
cv2.imshow("Test", blur)
cv2.waitKey()
Output:
Final Edit:
So using Eumel’s solution and combining this bit of code on the bottom of it yields a 100% successful result:
img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255
# Eumel's code above this line
img = cv2.erode(img, np.ones((3,3)))
cv2.imwrite("out.png", img)
cv2.imshow("Test", img)
print(pytesseract.image_to_string(Image.open("out.png"), lang='eng', config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789.,'))
Output Image Examples:
Whitelisting the tesseract characters appears to help quite a bit as well to prevent false identification.
That was a challenge but i think i have an interesting approach: Pattern-matching
If you zoom in, you realize that the pattern in the back only has 4 possible dots, a single full pixle, a double full pixel and a double pixel with a medium left or right. So what i did was grab these 4 patterns from the image with 17.160.000,00 and got to work. Save these to load again, i just grabbed them on the fly
img = cv2.imread('C:/Users/***/17.jpg', cv2.IMREAD_GRAYSCALE)
pattern_1 = img[2:5,1:5]
pattern_2 = img[6:9,5:9]
pattern_3 = img[6:9,11:15]
pattern_4 = img[9:12,22:26]
# just to show it carries over to other pics ;)
img = cv2.imread('C:/Users/****/6.jpg', cv2.IMREAD_GRAYSCALE)
Actual Pattern Matching
Next we match all the patterns and threshold to find all occurrences, i used 0.7 but you can play around with it a little. These patterns take off some pixels on the side and only match a sigle pixel on the left so we pad twice (one with an extra) to hit both for the first 3 patterns. The last one is the single pixel so it doesnt need it
res_1 = cv2.matchTemplate(img,pattern_1,cv2.TM_CCOEFF_NORMED )
thresh_1 = cv2.threshold(res_1,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_1 = np.pad(thresh_1,((1,1),(1,2)),'constant')
pat_thresh_15 = np.pad(thresh_1,((1,1),(2,1)), 'constant')
res_2 = cv2.matchTemplate(img,pattern_2,cv2.TM_CCOEFF_NORMED )
thresh_2 = cv2.threshold(res_2,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_2 = np.pad(thresh_2,((1,1),(1,2)),'constant')
pat_thresh_25 = np.pad(thresh_2,((1,1),(2,1)), 'constant')
res_3 = cv2.matchTemplate(img,pattern_3,cv2.TM_CCOEFF_NORMED )
thresh_3 = cv2.threshold(res_3,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_3 = np.pad(thresh_3,((1,1),(1,2)),'constant')
pat_thresh_35 = np.pad(thresh_3,((1,1),(2,1)), 'constant')
res_4 = cv2.matchTemplate(img,pattern_4,cv2.TM_CCOEFF_NORMED )
thresh_4 = cv2.threshold(res_4,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_4 = np.pad(thresh_4,((1,1),(1,2)),'constant')
Editing the Image
Now the only thing left to do is remove all the matches from the image. Since we have a mostly white backround we just set them to 255 to blend in.
img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255
Output
Edit:
Take a look at Abstracts answer as well for refining this output and tesseract finetuning