Python3 Key error when Key exists in dictionary of image files
Question:
-
Using python and os to creat dictionary of key values for files in directory, and tensor flow to preprocess images and extract/print text.
-
End Goal: create a For Loop that takes each image in the directory, appends the filename as string to path in grocery_cve_project
, processes each image, and extracts the text to be read in the console
import os
print('os imported')
# import packages
from PIL import Image
import pytesseract
import cv2
print('packages imported')
### Part 1: store image names in dictionary
dir_name = ".\grocery_cve_project"
# This is where we get our array
# of file names and store in results
result = os.listdir(dir_name)
key_index_store = {}
for i, e in enumerate(result):
key_index_store[i] = e
#print(i, e)
#print("Our key value store is: ")
#print(key_index_store)
# The types of file names we care about.
photo_extensions = [".jpg", ".png"]
# declare the tesseract executable path
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'
Part 2: image processing
for e in key_index_store[e]:
image_to_ocr = cv2.imread('grocery_cve_project_\%s' % 'e')
print(image_to_ocr)
# convert to gray
preprocessed_img = cv2.cvtColor(image_to_ocr, cv2.COLOR_BGR2GRAY)
# step 2: do binary and Otsu thresholding
preprocessed_img = cv2.threshold(preprocessed_img, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# step 3: Median Blur to remove noise in image
preprocessed_img = cv2.medianBlur(preprocessed_img, 3)
'''Step 4: SAVE AND LOAD IMAGE AS PIL image'''
# step 1: Save the processed image to convert to PIL image
for i in key_index_store[i]:
cv2.imwrite(("tempdir\temp_img_%s.jpg" % 'i'), preprocessed_img)
# step 2: load the image as a PIL/Pillow image
preprocessed__pil_img = Image.open('temp_img.jpg')
# step 1: do OCR of image using Tesseract
text_extracted = pytesseract.image_to_string(preprocessed__pil_img)
#Step 2: print the text
print(text_extracted)
(Grocery_env) D:DocumentsPythonMultiple file array>"1. grocery tesseract.py"
os imported
packages imported
Traceback (most recent call last):
File "D:DocumentsPythonMultiple file array1. grocery tesseract.py", line 44, in <module>
for e in key_index_store[e]:
KeyError: 'file_99.png'
-
research indicates this error comes up when an item in the dictionary does not exist. However, if I run the code commented out in line 21 print(i, e)
, it puts out the key/value pairs for all the files in the directory, and ‘file_99’ does exist at index 236, AND physically in the given directory.
-
the directory for the image files is in the same folder as the source code.
Answers:
In the first part you populate the dictionary with numerical indexes
key_index_store = {}
for i, e in enumerate(result):
key_index_store[i] = e
This is a bit redundant as your results are already indexed by number.
Then, on second part you iterate over key_index_store[e]
its most likely an error, just remove the [e]
If I understood your code properly, I think you might be slightly confused about how to extract key/value pairs from dictionaries. But in this case the dict isn’t even necessary.
You could write this all in a single loop:
for idx, filename in enumerate(result):
image_to_ocr = cv2.imread(os.path.join(dir_name, filename))
# ... your image processing code ...
out_filename = os.path.join("tempdir", f"temp_img_{idx}.jpg")
cv2.imwrite(out_filename, preprocessed_img)
preprocessed_pil_img = Image.open(out_filename)
# ... the rest ...
-
Using python and os to creat dictionary of key values for files in directory, and tensor flow to preprocess images and extract/print text.
-
End Goal: create a For Loop that takes each image in the directory, appends the filename as string to path in
grocery_cve_project
, processes each image, and extracts the text to be read in the console
import os
print('os imported')
# import packages
from PIL import Image
import pytesseract
import cv2
print('packages imported')
### Part 1: store image names in dictionary
dir_name = ".\grocery_cve_project"
# This is where we get our array
# of file names and store in results
result = os.listdir(dir_name)
key_index_store = {}
for i, e in enumerate(result):
key_index_store[i] = e
#print(i, e)
#print("Our key value store is: ")
#print(key_index_store)
# The types of file names we care about.
photo_extensions = [".jpg", ".png"]
# declare the tesseract executable path
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'
Part 2: image processing
for e in key_index_store[e]:
image_to_ocr = cv2.imread('grocery_cve_project_\%s' % 'e')
print(image_to_ocr)
# convert to gray
preprocessed_img = cv2.cvtColor(image_to_ocr, cv2.COLOR_BGR2GRAY)
# step 2: do binary and Otsu thresholding
preprocessed_img = cv2.threshold(preprocessed_img, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# step 3: Median Blur to remove noise in image
preprocessed_img = cv2.medianBlur(preprocessed_img, 3)
'''Step 4: SAVE AND LOAD IMAGE AS PIL image'''
# step 1: Save the processed image to convert to PIL image
for i in key_index_store[i]:
cv2.imwrite(("tempdir\temp_img_%s.jpg" % 'i'), preprocessed_img)
# step 2: load the image as a PIL/Pillow image
preprocessed__pil_img = Image.open('temp_img.jpg')
# step 1: do OCR of image using Tesseract
text_extracted = pytesseract.image_to_string(preprocessed__pil_img)
#Step 2: print the text
print(text_extracted)
(Grocery_env) D:DocumentsPythonMultiple file array>"1. grocery tesseract.py"
os imported
packages imported
Traceback (most recent call last):
File "D:DocumentsPythonMultiple file array1. grocery tesseract.py", line 44, in <module>
for e in key_index_store[e]:
KeyError: 'file_99.png'
-
research indicates this error comes up when an item in the dictionary does not exist. However, if I run the code commented out in line 21
print(i, e)
, it puts out the key/value pairs for all the files in the directory, and ‘file_99’ does exist at index 236, AND physically in the given directory. -
the directory for the image files is in the same folder as the source code.
In the first part you populate the dictionary with numerical indexes
key_index_store = {}
for i, e in enumerate(result):
key_index_store[i] = e
This is a bit redundant as your results are already indexed by number.
Then, on second part you iterate over key_index_store[e]
its most likely an error, just remove the [e]
If I understood your code properly, I think you might be slightly confused about how to extract key/value pairs from dictionaries. But in this case the dict isn’t even necessary.
You could write this all in a single loop:
for idx, filename in enumerate(result):
image_to_ocr = cv2.imread(os.path.join(dir_name, filename))
# ... your image processing code ...
out_filename = os.path.join("tempdir", f"temp_img_{idx}.jpg")
cv2.imwrite(out_filename, preprocessed_img)
preprocessed_pil_img = Image.open(out_filename)
# ... the rest ...