pypdf gives output with incorrect PDF format

Question

I am using the following code to resize pages in a PDF:

from pypdf import PdfReader, PdfWriter, Transformation, PageObject, PaperSize
from pypdf.generic import RectangleObject

reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
  

  A4_w = PaperSize.A4.width
  A4_h = PaperSize.A4.height

  # resize page to fit *inside* A4
  h = float(page.mediabox.height)
  w = float(page.mediabox.width)
  scale_factor = min(A4_h/h, A4_w/w)

  transform = Transformation().scale(scale_factor,scale_factor).translate(0, A4_h/2 - h*scale_factor/2)
  page.add_transformation(transform)

  page.cropbox = RectangleObject((0, 0, A4_w, A4_h))

  # merge the pages to fit inside A4

  # prepare A4 blank page
  page_A4 = PageObject.create_blank_page(width = A4_w, height = A4_h)
  page.mediabox = page_A4.mediabox
  page_A4.merge_page(page)

  writer.add_page(page_A4)
writer.write('output.pdf')

Source: https://stackoverflow.com/a/75274841/11501160

While this code works fine for the resizing part, I have found that most input files work fine but some input files do not work fine.

I am providing download links to input.pdf and output.pdf files for testing and review. The output file is completely different from the input file. The images are missing, the background colour is different, even the pure text on first page has only the first line visible.

What is interesting is that these difference are only seen when I open the output pdf in Adobe Acrobat, or look at the physically printed pages.
The PDF looks perfect when i open in Preview (on MacOS) or open the PDF in my Chrome Browser.

and

The origin of the input pdf is that I created it in Preview (on MacOS) by mixing pages from different PDFs and dragging image files into the thumbnails as per these instructions:
https://support.apple.com/en-ca/HT202945
I’ve never had a problem before while making PDFs like this and even Adobe Acrobat reads the input pdf properly. Only the output pdf is problematic in Acrobat and in printers.

Is this a bug with pypdf or am I doing something wrong ?
How can i get the output PDF to be proper in Adobe Acrobat and printers etc ?

Asked By: Zain Khaishagi

||

Source

Answer 1

The following is what PyMuPDF has to offer here. The output displays correctly in all PDF readers:

import fitz  # import PyMuPDF

src = fitz.open("input.pdf")
doc = fitz.open()
for i in range(len(src)):
    page = doc.new_page()  # this is A4 portrait by default
    page.show_pdf_page(page.rect, src, i)  # scaling will happen automatically
doc.save("fitz-output.pdf",garbage=3,deflate=True)

The above method show_pdf_page() supports many more options, like selecting sub-rectangles form the source page, rotating it by arbitrary angles, and of course freely select the target page’s sub-rectangle to receive the content.

Answered By: Jorj McKie

Answer 2

This is a valid bug with pypdf and the fix is due to be released in the next version.
Refer:
https://github.com/py-pdf/pypdf/issues/1607

Answered By: Zain Khaishagi

pypdf gives output with incorrect PDF format

Question:

Answers: