How pyPdf understand document boundaries?

Question:

Here I’ve found this code for splitting pdf page.

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()
for p in [input.getPage(i) for i in range(0,input.getNumPages())]:
    q = copy.copy(p)
    (w, h) = p.mediaBox.upperRight
    p.mediaBox.upperRight = (w/2, h)
    q.mediaBox.upperLeft = (w/2, h)
    output.addPage(p)
    output.addPage(q)
output.write(sys.stdout)

If one page contains four another pages like this:

+-------+-------+
|   1   |   2   |
|-------+-------|
|   3   |   4   |
+-------+-------+

Then the code will split it to two pages (in this order) containing another two pages:

+-------+-------+
|   3   |   4   |
+-------+-------+

+-------+-------+
|   1   |   2   |
+-------+-------+

You can test it e.g. on following document. If I correctly understand upperRight, upperLeft (and other) variables mentioned in code, then this is the document representation as seen by pyPdf:

UL(0,10)        UR(10,10)
+-------+-------+
|   1   |   2   |
|-------+-------|
|   3   |   4   |
+-------+-------+
LL(0,0)         LR(10,0)

UL(x,y) = UpperLeft
UR(x,y) = UpperRight
LL(x,y) = LowerLeft
LR(x,y) = LowerRight

According to mentioned code:

(w, h) = p.mediaBox.upperRight
p.mediaBox.upperRight = (w/2, h)
q.mediaBox.upperLeft = (w/2, h)

I was expecting this output:

p:
+-------+
|   1   |
|-------+
|   3   |
+-------+

q:
+-------+
|   2   |
|-------+
|   4   |
+-------+

What I’m missing here?

Asked By: Wakan Tanka

||

Answers:

In PDF there are 2 ways to get a landscape page:

  1. Define a page with width > height.
  2. Define a portrait page (width < height) and a rotation (90 degrees, 270 degrees, etc).

Your sample PDF uses the second way: all the pages are 595×842 with a rotation of 270 degrees. Not taking the rotation into account causes vertical to be interpreted as horizontal and vice versa.

Answered By: rhens
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.