How to extract print scaling factor from a PDF file in Python?
Question:
I would like to extract the PDF scaling factor programmatically using Python. Specifically, I want to extract the scale factor that appears in the "Fit" under "Print Sizing & Handling" when printing the PDF file.
For example, if the "Fit" dropdown shows a scale of 24%, I want to extract the number 0.24.
I have looked at the PyPDF2 library, but I’m not sure which metadata field contains the print scaling factor. Does anyone know a solutions to this problem? It does not have to use PyPDF2 necessarily.
I have tried to extract the print scaling factor by measuring the dimensions of the PDF page and relating it to the dimension of A4 paper (21.59 by 27.94 in cm), but I’m not sure how to convert these dimensions to a scaling factor. I am almost certain that the output media_box.getWidth()
or media_box.getHeight()
is not in cm.
Here’s the code I tried:
with open('example.pdf', 'rb') as pdf_file:
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
page = pdf_reader.getPage(0)
media_box = page.mediaBox
total_width = media_box.getWidth()
total_height = media_box.getHeight()
# calculate the scaling factor based on the ratio
scale_factor = (total_width*total_height) / (21.59 * 27.94)
print(scale_factor)
# returns: 11402.877137305077 (which is clearly wrong)
Answers:
Looking at the documentation, the size is in pixels at 72 ppi. If you are trying to scale use a custom size, you could use the 72 ppi, convert the inches to cm, and then scale from there. However, if you are planning to use a standard size like A4
, you could just use the dimensions stated in the docs at A4= Dimensions(width=595, height=842)
.
I can’t test it since I don’t have the pdf file to test the scaling, but another thing to consider is how you get your scale. If I’m trying to scale a page while maintaining the size ratio, I would probably use the lower scale amount between height and width rather than multiplying them together to divide. This would leave one of them at a smaller size than target, but that can be padded to fit the target size.
I’m not 100% sure what you are trying to accomplish.
The code below is using the Python Package pypdf 3.5.2.
My PDF document was formatted for the A4 paper size.
I looked through the metadata
fields for the PDF and I didn’t see any field related to print scaling factor.
from pypdf import PaperSize, PdfReader
with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
page = pdf_reader.pages[0]
# these dimension are in points
page_width = page.mediabox.width
page_height = page.mediabox.height
# A4 paper size
paper_size_format_A4 = PaperSize.A4
print('A4 paper size in points')
print(paper_size_format_A4)
print('n')
print('A4 paper size in inches: 8.27 x 11.69')
print(f'Width size in inches: {page_width / 72}')
print(f'Height size in inches: {page_height / 72}')
print('n')
print('A4 paper size in points: 595 × 842')
print(f'Width size in Postscript points: {page_width}')
print(f'Height size in Postscript points: {page_height}')
print('n')
print('A4 paper size in centimeters: 21 x 29.7')
print(f'Width size in centimeters: {round(2.54 * page_width / 72, 2)}')
print(f'Height size in centimeters: {round(2.54 * page_height / 72, 2)}')
print('n')
print('A4 paper size in millimetres: 210 x 297')
print(f'Width size in millimeters: {round(25.40 * page_width / 72, 2)}')
print(f'Height size in millimeters: {round(25.40 * page_height / 72, 2)}')
Print Output from the code above:
A4 paper size in points:
Dimensions(width=595, height=842)
A4 paper size in inches: 8.27 x 11.69
Width size in inches: 8.277777777777779
Height size in inches: 11.694444444444445
A4 paper size in points: 595 × 842
Width size in Postscript points: 596
Height size in Postscript points: 842
A4 paper size in centimeters: 21 x 29.7
Width size in centimeters: 21.03
Height size in centimeters: 29.7
A4 paper size in millimetres: 210 x 297
Width size in millimeters: 210.26
Height size in millimeters: 297.04
Here is an example for scaling a PDF to the A4 paper size.
from pypdf import PaperSize, PdfReader
with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
page = pdf_reader.pages[0]
# these dimension are in points
page_width = page.mediabox.width
page_height = page.mediabox.height
A4_width = PaperSize.A4.width
A4_height = PaperSize.A4.height
# resize page to fit *inside* A4
scale_factor = min(A4_height / page_height, A4_width / page_width)
print(scale_factor)
0.9983221476510067
If you change the page_height
and page_width
to use centimeters
you will get a scaling factor of ‘28.292914883499762’
Here is another code example where I reformat the page.
from pypdf import PaperSize, PdfReader
from pypdf.generic import RectangleObject
with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
page = pdf_reader.pages[0]
A4_width = PaperSize.A4.width
A4_height = PaperSize.A4.height
page_botton = page.mediabox.bottom
page_left = page.mediabox.left
page.mediabox = RectangleObject((page_left, page_botton, A4_width, A4_height))
page_width = page.mediabox.width
page_height = page.mediabox.height
A4 paper size in points: 595 × 842
print(page_width)
595.0
print(page_height)
842.0
scale_factor = min(A4_height / page_height, A4_width / page_width)
print(scale_factor)
1.0
The PDF used in the code above was allegedly A4. The one below is in Letter format 612 x 792. I use RectangleObject to reformat the page to 595 × 842
from pypdf import PaperSize, PdfReader, Transformation
from pypdf.generic import RectangleObject
# A :class:`RectangleObject<pypdf.generic.RectangleObject>`, expressed in
# default user space units, defining the visible region of default user space.
#
# When the page is displayed or printed, its contents are to be clipped (cropped)
# to this rectangle and then imposed on the output medium in
# some implementation-defined manner.
# Default value: same as :attr:`mediabox<mediabox>`
# input file is a PDF with the format of
# Page size name: Letter
# Inches: 8.5 x 11
# Postscript units: 612 x 792
with open('pdf_files/sample_letter_format.pdf', 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
page = pdf_reader.pages[0]
original_page_width = page.mediabox.width
print(original_page_width)
612
original_page_height = page.mediabox.height
print(original_page_height)
792
original_page_botton = page.mediabox.bottom
original_page_left = page.mediabox.left
A4_width = PaperSize.A4.width
A4_height = PaperSize.A4.height
page.mediabox = RectangleObject((original_page_left, original_page_botton, A4_width, A4_height))
new_page_width = page.mediabox.width
new_page_height = page.mediabox.height
print(new_page_width)
595.0
print(new_page_height)
842.0
scale_factor = min(A4_height / original_page_height, A4_width / original_page_width)
print(scale_factor)
0.9722222222222222
I would like to extract the PDF scaling factor programmatically using Python. Specifically, I want to extract the scale factor that appears in the "Fit" under "Print Sizing & Handling" when printing the PDF file.
For example, if the "Fit" dropdown shows a scale of 24%, I want to extract the number 0.24.
I have looked at the PyPDF2 library, but I’m not sure which metadata field contains the print scaling factor. Does anyone know a solutions to this problem? It does not have to use PyPDF2 necessarily.
I have tried to extract the print scaling factor by measuring the dimensions of the PDF page and relating it to the dimension of A4 paper (21.59 by 27.94 in cm), but I’m not sure how to convert these dimensions to a scaling factor. I am almost certain that the output media_box.getWidth()
or media_box.getHeight()
is not in cm.
Here’s the code I tried:
with open('example.pdf', 'rb') as pdf_file:
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
page = pdf_reader.getPage(0)
media_box = page.mediaBox
total_width = media_box.getWidth()
total_height = media_box.getHeight()
# calculate the scaling factor based on the ratio
scale_factor = (total_width*total_height) / (21.59 * 27.94)
print(scale_factor)
# returns: 11402.877137305077 (which is clearly wrong)
Looking at the documentation, the size is in pixels at 72 ppi. If you are trying to scale use a custom size, you could use the 72 ppi, convert the inches to cm, and then scale from there. However, if you are planning to use a standard size like A4
, you could just use the dimensions stated in the docs at A4= Dimensions(width=595, height=842)
.
I can’t test it since I don’t have the pdf file to test the scaling, but another thing to consider is how you get your scale. If I’m trying to scale a page while maintaining the size ratio, I would probably use the lower scale amount between height and width rather than multiplying them together to divide. This would leave one of them at a smaller size than target, but that can be padded to fit the target size.
I’m not 100% sure what you are trying to accomplish.
The code below is using the Python Package pypdf 3.5.2.
My PDF document was formatted for the A4 paper size.
I looked through the metadata
fields for the PDF and I didn’t see any field related to print scaling factor.
from pypdf import PaperSize, PdfReader
with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
page = pdf_reader.pages[0]
# these dimension are in points
page_width = page.mediabox.width
page_height = page.mediabox.height
# A4 paper size
paper_size_format_A4 = PaperSize.A4
print('A4 paper size in points')
print(paper_size_format_A4)
print('n')
print('A4 paper size in inches: 8.27 x 11.69')
print(f'Width size in inches: {page_width / 72}')
print(f'Height size in inches: {page_height / 72}')
print('n')
print('A4 paper size in points: 595 × 842')
print(f'Width size in Postscript points: {page_width}')
print(f'Height size in Postscript points: {page_height}')
print('n')
print('A4 paper size in centimeters: 21 x 29.7')
print(f'Width size in centimeters: {round(2.54 * page_width / 72, 2)}')
print(f'Height size in centimeters: {round(2.54 * page_height / 72, 2)}')
print('n')
print('A4 paper size in millimetres: 210 x 297')
print(f'Width size in millimeters: {round(25.40 * page_width / 72, 2)}')
print(f'Height size in millimeters: {round(25.40 * page_height / 72, 2)}')
Print Output from the code above:
A4 paper size in points:
Dimensions(width=595, height=842)
A4 paper size in inches: 8.27 x 11.69
Width size in inches: 8.277777777777779
Height size in inches: 11.694444444444445
A4 paper size in points: 595 × 842
Width size in Postscript points: 596
Height size in Postscript points: 842
A4 paper size in centimeters: 21 x 29.7
Width size in centimeters: 21.03
Height size in centimeters: 29.7
A4 paper size in millimetres: 210 x 297
Width size in millimeters: 210.26
Height size in millimeters: 297.04
Here is an example for scaling a PDF to the A4 paper size.
from pypdf import PaperSize, PdfReader
with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
page = pdf_reader.pages[0]
# these dimension are in points
page_width = page.mediabox.width
page_height = page.mediabox.height
A4_width = PaperSize.A4.width
A4_height = PaperSize.A4.height
# resize page to fit *inside* A4
scale_factor = min(A4_height / page_height, A4_width / page_width)
print(scale_factor)
0.9983221476510067
If you change the page_height
and page_width
to use centimeters
you will get a scaling factor of ‘28.292914883499762’
Here is another code example where I reformat the page.
from pypdf import PaperSize, PdfReader
from pypdf.generic import RectangleObject
with open('pdf_files/sample_a4.pdf', 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
page = pdf_reader.pages[0]
A4_width = PaperSize.A4.width
A4_height = PaperSize.A4.height
page_botton = page.mediabox.bottom
page_left = page.mediabox.left
page.mediabox = RectangleObject((page_left, page_botton, A4_width, A4_height))
page_width = page.mediabox.width
page_height = page.mediabox.height
A4 paper size in points: 595 × 842
print(page_width)
595.0
print(page_height)
842.0
scale_factor = min(A4_height / page_height, A4_width / page_width)
print(scale_factor)
1.0
The PDF used in the code above was allegedly A4. The one below is in Letter format 612 x 792. I use RectangleObject to reformat the page to 595 × 842
from pypdf import PaperSize, PdfReader, Transformation
from pypdf.generic import RectangleObject
# A :class:`RectangleObject<pypdf.generic.RectangleObject>`, expressed in
# default user space units, defining the visible region of default user space.
#
# When the page is displayed or printed, its contents are to be clipped (cropped)
# to this rectangle and then imposed on the output medium in
# some implementation-defined manner.
# Default value: same as :attr:`mediabox<mediabox>`
# input file is a PDF with the format of
# Page size name: Letter
# Inches: 8.5 x 11
# Postscript units: 612 x 792
with open('pdf_files/sample_letter_format.pdf', 'rb') as pdf_file:
pdf_reader = PdfReader(pdf_file)
page = pdf_reader.pages[0]
original_page_width = page.mediabox.width
print(original_page_width)
612
original_page_height = page.mediabox.height
print(original_page_height)
792
original_page_botton = page.mediabox.bottom
original_page_left = page.mediabox.left
A4_width = PaperSize.A4.width
A4_height = PaperSize.A4.height
page.mediabox = RectangleObject((original_page_left, original_page_botton, A4_width, A4_height))
new_page_width = page.mediabox.width
new_page_height = page.mediabox.height
print(new_page_width)
595.0
print(new_page_height)
842.0
scale_factor = min(A4_height / original_page_height, A4_width / original_page_width)
print(scale_factor)
0.9722222222222222