Convert PDF page to image with pyPDF2 and BytesIO

Question:

I have a function that gets a page from a PDF file via pyPdf2 and should convert the first page to a png (or jpg) with Pillow (PIL Fork)

from PyPDF2 import PdfFileWriter, PdfFileReader
import os
from PIL import Image
import io

# Open PDF Source #
app_path = os.path.dirname(__file__)
src_pdf= PdfFileReader(open(os.path.join(app_path, "../../../uploads/%s" % filename), "rb"))

# Get the first page of the PDF #
dst_pdf = PdfFileWriter()
dst_pdf.addPage(src_pdf.getPage(0))

# Create BytesIO #
pdf_bytes = io.BytesIO()
dst_pdf.write(pdf_bytes)
pdf_bytes.seek(0)

file_name = "../../../uploads/%s_p%s.png" % (name, pagenum)
img = Image.open(pdf_bytes)
img.save(file_name, 'PNG')
pdf_bytes.flush()

That results in an error:

OSError: cannot identify image file <_io.BytesIO object at 0x0000023440F3A8E0>

I found some threads with a similar issue, (PIL open() method not working with BytesIO) but I cannot see where I am wrong here, as I have pdf_bytes.seek(0) already added.

Any hints appreciated

Asked By: PrimuS

||

Answers:

Per document:

write(stream) Writes the collection of pages added to this object out
as a PDF file.

Parameters: stream – An object to write the file to. The object must
support the write method and the tell method, similar to a file
object.

So the object pdf_bytes contains a PDF file, not an image file.

The reason why there are codes like above work is: sometimes, the pdf file just contains a jpeg file as its content. If your pdf is just a normal pdf file, you can’t just read the bytes and parse it as an image.

And refer to as a more robust implementation: https://stackoverflow.com/a/34116472/334999

Answered By: Shuo
import glob, sys, fitz

# To get better resolution
zoom_x = 2.0  # horizontal zoom
zoom_y = 2.0  # vertical zoom
mat    = fitz.Matrix(zoom_x, zoom_y)  # zoom factor 2 in each dimension


filename = "/xyz/abcd/1234.pdf"  # name of pdf file you want to render
doc = fitz.open(filename)
for page in doc:
    pix = page.get_pixmap(matrix=mat)  # render page to an image
    pix.save("/xyz/abcd/1234.png")  # store image as a PNG

Hi,

This is to request that I am yet to receive the company ID card. My wife is having a delivery in 2 days. As a part of the insurance process hospital team asked me to submit the company ID Card.

Can you please provide me the ID Card ASAP, I am ready to come and collect the office on tomorrow. If physical card is not possible can you please send me the ID card by scanning.

Your help is much appreciated in this tough situation.

Credit

Convert PDF to Image in Python Using PyMuPDF

Answered By: thrinadhn
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.