Opening PDF within a zip folder fitz.open()
Question:
I have a function that opens a zip file, finds a pdf with a given filename, then reads the first page of the pdf to get some specific text. My issue is that after I locate the correct file, I can’t open it to read it. I have tried to use a relative path within the zip folder and a absolute path in my downloads folder and I keep getting the error:
no such file: ‘Deliverables_Rev BPlans_Rev B.pdf’
no such file: ‘C:UsersMyProfileDownloadsDeliverables_Rev BPlans_Rev B.pdf’
I have been commenting out the os.path.join line to change between the relative and absolute path as self.prefs[‘download_path’] returns my download folder.
I’m not sure what the issue with with the relative path is, any insight would be helpful, as I think it has to do with trying to read out of a zipped folder.
import zipfile as ZipFile
import fitz
def getjobcode(self, filename):
if '.zip' in filename:
with ZipFile(filename, 'r') as zipObj:
for document in zipObj.namelist():
if 'plans' in document.lower():
document = os.path.join(self.prefs['download_path'], document)
doc = fitz.open(document)
page1 = doc.load_page(0)
page1text = page1.get_text('text')
jobcode = page1text[page1text.index(
'PROJECT NUMBER'):page1text.index('PROJECT NUMBER') + 30][-12:]
return jobcode
Answers:
I ended up extracting the zip folder into the downloads folder then parsing the pdf to get the data I needed. Afterwords I created a job folder where I wanted it and moved the extracted folder into it from the downloads folder.
It works if you first open the pdf file as a bytesIO stream (chatGPT contributed to this answer).
import zipfile
import io
import fitz
# Open the ZIP file
with zipfile.ZipFile('example.zip', 'r') as myzip:
# Get the PDF file as a bytes-like object
pdf_data = io.BytesIO(myzip.read('example.pdf'))
# Open the PDF file with PyMuPDF (fitz)
with fitz.open(stream=pdf_data, filetype='pdf') as doc:
# Do whatever you want with the PDF document
# For example, you can get the number of pages:
num_pages = len(doc)
I have a function that opens a zip file, finds a pdf with a given filename, then reads the first page of the pdf to get some specific text. My issue is that after I locate the correct file, I can’t open it to read it. I have tried to use a relative path within the zip folder and a absolute path in my downloads folder and I keep getting the error:
no such file: ‘Deliverables_Rev BPlans_Rev B.pdf’
no such file: ‘C:UsersMyProfileDownloadsDeliverables_Rev BPlans_Rev B.pdf’
I have been commenting out the os.path.join line to change between the relative and absolute path as self.prefs[‘download_path’] returns my download folder.
I’m not sure what the issue with with the relative path is, any insight would be helpful, as I think it has to do with trying to read out of a zipped folder.
import zipfile as ZipFile
import fitz
def getjobcode(self, filename):
if '.zip' in filename:
with ZipFile(filename, 'r') as zipObj:
for document in zipObj.namelist():
if 'plans' in document.lower():
document = os.path.join(self.prefs['download_path'], document)
doc = fitz.open(document)
page1 = doc.load_page(0)
page1text = page1.get_text('text')
jobcode = page1text[page1text.index(
'PROJECT NUMBER'):page1text.index('PROJECT NUMBER') + 30][-12:]
return jobcode
I ended up extracting the zip folder into the downloads folder then parsing the pdf to get the data I needed. Afterwords I created a job folder where I wanted it and moved the extracted folder into it from the downloads folder.
It works if you first open the pdf file as a bytesIO stream (chatGPT contributed to this answer).
import zipfile
import io
import fitz
# Open the ZIP file
with zipfile.ZipFile('example.zip', 'r') as myzip:
# Get the PDF file as a bytes-like object
pdf_data = io.BytesIO(myzip.read('example.pdf'))
# Open the PDF file with PyMuPDF (fitz)
with fitz.open(stream=pdf_data, filetype='pdf') as doc:
# Do whatever you want with the PDF document
# For example, you can get the number of pages:
num_pages = len(doc)