Opening PDF within a zip folder fitz.open()

Question:

I have a function that opens a zip file, finds a pdf with a given filename, then reads the first page of the pdf to get some specific text. My issue is that after I locate the correct file, I can’t open it to read it. I have tried to use a relative path within the zip folder and a absolute path in my downloads folder and I keep getting the error:
no such file: ‘Deliverables_Rev BPlans_Rev B.pdf’
no such file: ‘C:UsersMyProfileDownloadsDeliverables_Rev BPlans_Rev B.pdf’

I have been commenting out the os.path.join line to change between the relative and absolute path as self.prefs[‘download_path’] returns my download folder.
I’m not sure what the issue with with the relative path is, any insight would be helpful, as I think it has to do with trying to read out of a zipped folder.

import zipfile as ZipFile
import fitz

def getjobcode(self, filename):
    if '.zip' in filename:
        with ZipFile(filename, 'r') as zipObj:
            for document in zipObj.namelist():
                if 'plans' in document.lower():
                    document = os.path.join(self.prefs['download_path'], document)
                    doc = fitz.open(document)
                    page1 = doc.load_page(0)
                    page1text = page1.get_text('text')
                    jobcode = page1text[page1text.index(
                        'PROJECT NUMBER'):page1text.index('PROJECT NUMBER') + 30][-12:]
    return jobcode
Asked By: Ryan

||

Answers:

I ended up extracting the zip folder into the downloads folder then parsing the pdf to get the data I needed. Afterwords I created a job folder where I wanted it and moved the extracted folder into it from the downloads folder.

Answered By: Ryan

It works if you first open the pdf file as a bytesIO stream (chatGPT contributed to this answer).

import zipfile
import io
import fitz

# Open the ZIP file
with zipfile.ZipFile('example.zip', 'r') as myzip:
    # Get the PDF file as a bytes-like object
    pdf_data = io.BytesIO(myzip.read('example.pdf'))

# Open the PDF file with PyMuPDF (fitz)
with fitz.open(stream=pdf_data, filetype='pdf') as doc:
    # Do whatever you want with the PDF document
    # For example, you can get the number of pages:
    num_pages = len(doc)
Answered By: Danferno
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.