How to extract print scaling factor from a PDF file in Python?

Question:

I would like to extract the PDF scaling factor programmatically using Python. Specifically, I want to extract the scale factor that appears in the "Fit" under "Print Sizing & Handling" when printing the PDF file.

For example, if the "Fit" dropdown shows a scale of 24%, I want to extract the number 0.24.

I have looked at the PyPDF2 library, but I’m not sure which metadata field contains the print scaling factor. Does anyone know a solutions to this problem? It does not have to use PyPDF2 necessarily.

I have tried to extract the print scaling factor by measuring the dimensions of the PDF page and relating it to the dimension of A4 paper (21.59 by 27.94 in cm), but I’m not sure how to convert these dimensions to a scaling factor. I am almost certain that the output media_box.getWidth() or media_box.getHeight() is not in cm.

Here’s the code I tried:

with open('example.pdf', 'rb') as pdf_file:
    pdf_reader = PyPDF2.PdfFileReader(pdf_file)
    page = pdf_reader.getPage(0)
    media_box = page.mediaBox
    
    total_width = media_box.getWidth()
    total_height = media_box.getHeight()
    
    # calculate the scaling factor based on the ratio
    scale_factor = (total_width*total_height) / (21.59 * 27.94)
    
    print(scale_factor)
    # returns: 11402.877137305077 (which is clearly wrong)
Asked By: Josh

||

Answers:

Looking at the documentation, the size is in pixels at 72 ppi. If you are trying to scale use a custom size, you could use the 72 ppi, convert the inches to cm, and then scale from there. However, if you are planning to use a standard size like A4, you could just use the dimensions stated in the docs at A4= Dimensions(width=595, height=842).

I can’t test it since I don’t have the pdf file to test the scaling, but another thing to consider is how you get your scale. If I’m trying to scale a page while maintaining the size ratio, I would probably use the lower scale amount between height and width rather than multiplying them together to divide. This would leave one of them at a smaller size than target, but that can be padded to fit the target size.

Answered By: Shorn

I’m not 100% sure what you are trying to accomplish.

The code below is using the Python Package pypdf 3.5.2.

My PDF document was formatted for the A4 paper size.

I looked through the metadata fields for the PDF and I didn’t see any field related to print scaling factor.

from pypdf import PaperSize, PdfReader

with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
    pdf_reader = PdfReader(pdf_file)
    page = pdf_reader.pages[0]

    # these dimension are in points
    page_width = page.mediabox.width
    page_height = page.mediabox.height

    # A4 paper size
    paper_size_format_A4 = PaperSize.A4
    print('A4 paper size in points')
    print(paper_size_format_A4)
    print('n')
    print('A4 paper size in inches:  8.27 x 11.69')
    print(f'Width size in inches: {page_width / 72}')
    print(f'Height size in inches: {page_height / 72}')
    print('n')
    print('A4 paper size in points:  595 × 842')
    print(f'Width size in Postscript points: {page_width}')
    print(f'Height size in Postscript points: {page_height}')
    print('n')
    print('A4 paper size in centimeters:  21 x 29.7')
    print(f'Width size in centimeters: {round(2.54 * page_width / 72, 2)}')
    print(f'Height size in centimeters: {round(2.54 * page_height / 72, 2)}')
    print('n')
    print('A4 paper size in millimetres:  210 x 297')
    print(f'Width size in millimeters: {round(25.40 * page_width / 72, 2)}')
    print(f'Height size in millimeters: {round(25.40 * page_height / 72, 2)}')

Print Output from the code above:

A4 paper size in points:
Dimensions(width=595, height=842)


A4 paper size in inches:  8.27 x 11.69
Width size in inches: 8.277777777777779
Height size in inches: 11.694444444444445


A4 paper size in points:  595 × 842
Width size in Postscript points: 596
Height size in Postscript points: 842


A4 paper size in centimeters:  21 x 29.7
Width size in centimeters: 21.03
Height size in centimeters: 29.7


A4 paper size in millimetres:  210 x 297
Width size in millimeters: 210.26
Height size in millimeters: 297.04

Here is an example for scaling a PDF to the A4 paper size.

from pypdf import PaperSize, PdfReader

with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
    pdf_reader = PdfReader(pdf_file)
    page = pdf_reader.pages[0]
    
    # these dimension are in points
    page_width = page.mediabox.width
    page_height = page.mediabox.height

    A4_width = PaperSize.A4.width
    A4_height = PaperSize.A4.height
    
    # resize page to fit *inside* A4
    scale_factor = min(A4_height / page_height, A4_width / page_width)

    print(scale_factor)
    0.9983221476510067

If you change the page_height and page_width to use centimeters you will get a scaling factor of ‘28.292914883499762’

Here is another code example where I reformat the page.

from pypdf import PaperSize, PdfReader
from pypdf.generic import RectangleObject

with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
    pdf_reader = PdfReader(pdf_file)
    page = pdf_reader.pages[0]

    A4_width = PaperSize.A4.width
    A4_height = PaperSize.A4.height
    page_botton = page.mediabox.bottom
    page_left = page.mediabox.left

    page.mediabox = RectangleObject((page_left, page_botton, A4_width, A4_height))
    page_width = page.mediabox.width
    page_height = page.mediabox.height
    
    A4 paper size in points:  595 × 842
    print(page_width)
    595.0

    print(page_height)
    842.0

    scale_factor = min(A4_height / page_height, A4_width / page_width)
    print(scale_factor)
    1.0

The PDF used in the code above was allegedly A4. The one below is in Letter format 612 x 792. I use RectangleObject to reformat the page to 595 × 842

from pypdf import PaperSize, PdfReader, Transformation
from pypdf.generic import RectangleObject

# A :class:`RectangleObject<pypdf.generic.RectangleObject>`, expressed in
# default user space units, defining the visible region of default user space.
# 
# When the page is displayed or printed, its contents are to be clipped (cropped) 
# to this rectangle and then imposed on the output medium in 
# some implementation-defined manner.  
# Default value: same as :attr:`mediabox<mediabox>`


# input file is a PDF with the format of
# Page size name: Letter
# Inches:  8.5 x 11
# Postscript units: 612 x 792
with open('pdf_files/sample_letter_format.pdf', 'rb') as pdf_file:
    pdf_reader = PdfReader(pdf_file)
    page = pdf_reader.pages[0]

    original_page_width = page.mediabox.width
    print(original_page_width)
    612

    original_page_height = page.mediabox.height
    print(original_page_height)
    792

    original_page_botton = page.mediabox.bottom
    original_page_left = page.mediabox.left

    A4_width = PaperSize.A4.width
    A4_height = PaperSize.A4.height

    page.mediabox = RectangleObject((original_page_left, original_page_botton, A4_width, A4_height))
    new_page_width = page.mediabox.width
    new_page_height = page.mediabox.height

    print(new_page_width)
    595.0

    print(new_page_height)
    842.0

    scale_factor = min(A4_height / original_page_height, A4_width / original_page_width)
    print(scale_factor)
    0.9722222222222222

Answered By: Life is complex
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.