pypdf gives output with incorrect PDF format

Question:

I am using the following code to resize pages in a PDF:

from pypdf import PdfReader, PdfWriter, Transformation, PageObject, PaperSize
from pypdf.generic import RectangleObject

reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
  

  A4_w = PaperSize.A4.width
  A4_h = PaperSize.A4.height

  # resize page to fit *inside* A4
  h = float(page.mediabox.height)
  w = float(page.mediabox.width)
  scale_factor = min(A4_h/h, A4_w/w)

  transform = Transformation().scale(scale_factor,scale_factor).translate(0, A4_h/2 - h*scale_factor/2)
  page.add_transformation(transform)

  page.cropbox = RectangleObject((0, 0, A4_w, A4_h))

  # merge the pages to fit inside A4

  # prepare A4 blank page
  page_A4 = PageObject.create_blank_page(width = A4_w, height = A4_h)
  page.mediabox = page_A4.mediabox
  page_A4.merge_page(page)

  writer.add_page(page_A4)
writer.write('output.pdf')

Source: https://stackoverflow.com/a/75274841/11501160

While this code works fine for the resizing part, I have found that most input files work fine but some input files do not work fine.

I am providing download links to input.pdf and output.pdf files for testing and review. The output file is completely different from the input file. The images are missing, the background colour is different, even the pure text on first page has only the first line visible.

What is interesting is that these difference are only seen when I open the output pdf in Adobe Acrobat, or look at the physically printed pages.
The PDF looks perfect when i open in Preview (on MacOS) or open the PDF in my Chrome Browser.

input file

and

output file

The origin of the input pdf is that I created it in Preview (on MacOS) by mixing pages from different PDFs and dragging image files into the thumbnails as per these instructions:
https://support.apple.com/en-ca/HT202945
I’ve never had a problem before while making PDFs like this and even Adobe Acrobat reads the input pdf properly. Only the output pdf is problematic in Acrobat and in printers.

Is this a bug with pypdf or am I doing something wrong ?
How can i get the output PDF to be proper in Adobe Acrobat and printers etc ?

Asked By: Zain Khaishagi

||

Answers:

The following is what PyMuPDF has to offer here. The output displays correctly in all PDF readers:

import fitz  # import PyMuPDF

src = fitz.open("input.pdf")
doc = fitz.open()
for i in range(len(src)):
    page = doc.new_page()  # this is A4 portrait by default
    page.show_pdf_page(page.rect, src, i)  # scaling will happen automatically
doc.save("fitz-output.pdf",garbage=3,deflate=True)

The above method show_pdf_page() supports many more options, like selecting sub-rectangles form the source page, rotating it by arbitrary angles, and of course freely select the target page’s sub-rectangle to receive the content.

Answered By: Jorj McKie

This is a valid bug with pypdf and the fix is due to be released in the next version.
Refer:
https://github.com/py-pdf/pypdf/issues/1607

Answered By: Zain Khaishagi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.