PDF Generation out of an images list takes too long – Python

Question:

I’m trying to generate a PDF using a list of 3 images, but it’s being a bottleneck in my program – taking up to 30 seconds per PDF. I need to process a very big amount of images, so this time just wouldn’t work. None of the solutions that I have tried so far have helped too much. The three images I’m testing with are 60 KB, 125 KB and 134 KB respectively.

I’ve tried using PIL, getting aroung 27 seconds per PDF. I used the following code:

def pil_pdf():  # 27 sec
    downloads = r"C:UsersUSERDownloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        current_image = Image.open(os.path.join(downloads, f"{i}.png")).convert("RGB")
        imagelist.append(current_image)

    out_folder = os.path.join(r"C:UsersUSERDownloads", f"out_vPIL.pdf")
    imagelist[0].save(out_folder, save_all=True, append_images=imagelist[1:])

… as well as with FPDF:

def new_pdf():  # 25 sec
    downloads = r"C:UsersUSERDownloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        imagelist.append(os.path.join(downloads, f"{i}.png"))

    pdf = FPDF()
    for image in imagelist:
        pdf.add_page()
        pdf.image(image, 0, 0, 210, 297)

    pdf.output(os.path.join(r"C:UsersUSERDownloads", f"out.pdf"))

I’d like to take the time down to about 10 seconds per PDF, but so far I haven’t gotten any useful advice. Any advice would be extremely welcome.

Thanks so much for any suggestions or recommendations!

Asked By: h311p0w5

||

Answers:

Let me try a bet: the best performance you should see is with PyMuPDF:

import fitz  # import PyMuPDF

imglist = [...]  # your list of image filenames
doc = fitz.open()  # new empty PDF

for ifile in imglist:
    idoc = fitz.open(ifile)
    pdfbytes = idoc.convert_to_pdf()
    doc.insert_pdf(fitz.open("pdf", pdfbytes))

doc.save("myimages.pdf", garbage=3, deflate=True)
Answered By: Jorj McKie