Creating a small pdf from JPGs to be web optimized
Question:
I have a number of images in a folder (around 2000 images) with approximately 150KB each. They have a dpi of 96. I want to create a pdf with these images as small as possible. I am using PyPDF2 to create the pdf but is resulting in a very large file app 800000MB. Is there a better way pls? This is my code:
for i in range(0, noImages):
image = Image.open(newimageList[i])
if(i == 0):
image.save(namePath, "PDF" ,quality=100, optimize=True, save_all=True)
else:
namePath2 = rootPDF+pdf_id+"temp.pdf"
image.save(namePath2, "PDF" ,quality=100, optimize=True, save_all=True)
pdfs = [namePath, namePath2]
merger = PdfFileMerger()
for f in pdfs:
merger.append(PdfFileReader(f), 'rb')
with open(namePath, 'wb') as new_file:
merger.write(new_file)
os.remove(namePath2)
Answers:
Try fpdf
from fpdf import FPDF
pdf = FPDF()
# imagelist is the list with all image filenames
for image in imagelist:
pdf.add_page()
pdf.image(image,x,y,w,h)
pdf.output("yourfile.pdf", "F")
You can check out their tutorial page as well.
What I would suggest is to convert the images to an intermediate multi-page TIFF file, and then convert that to pdf.
The software you need for that is convert
from the ImageMagick suite (an essential tool for everybody working with bitmap graphics) and tiff2pdf
from libtiff.
First, convert all the JPEGS to an intermediate TIFF file.
This example presumes files named page001.jpg
, page002.jpg
et cetera. Adjust as needed:
convert -density 150 -units PixelsPerInch page*.jpg
-adjoin intermediate.tiff
(You only need to specify the resolution with density
if the files don’t include resolution metadata.)
Then convert the intermediate to PDF using JPEG compression:
tiff2pdf -r 150 -j -o output.pdf intermediate.tiff
As an example, I used this technique to convert four scanned pages (JPEG format, sRGB color, 1216×1704 pixels) totalling 990098 bytes to a 651149 bytes PDF file.
I have a number of images in a folder (around 2000 images) with approximately 150KB each. They have a dpi of 96. I want to create a pdf with these images as small as possible. I am using PyPDF2 to create the pdf but is resulting in a very large file app 800000MB. Is there a better way pls? This is my code:
for i in range(0, noImages):
image = Image.open(newimageList[i])
if(i == 0):
image.save(namePath, "PDF" ,quality=100, optimize=True, save_all=True)
else:
namePath2 = rootPDF+pdf_id+"temp.pdf"
image.save(namePath2, "PDF" ,quality=100, optimize=True, save_all=True)
pdfs = [namePath, namePath2]
merger = PdfFileMerger()
for f in pdfs:
merger.append(PdfFileReader(f), 'rb')
with open(namePath, 'wb') as new_file:
merger.write(new_file)
os.remove(namePath2)
Try fpdf
from fpdf import FPDF
pdf = FPDF()
# imagelist is the list with all image filenames
for image in imagelist:
pdf.add_page()
pdf.image(image,x,y,w,h)
pdf.output("yourfile.pdf", "F")
You can check out their tutorial page as well.
What I would suggest is to convert the images to an intermediate multi-page TIFF file, and then convert that to pdf.
The software you need for that is convert
from the ImageMagick suite (an essential tool for everybody working with bitmap graphics) and tiff2pdf
from libtiff.
First, convert all the JPEGS to an intermediate TIFF file.
This example presumes files named page001.jpg
, page002.jpg
et cetera. Adjust as needed:
convert -density 150 -units PixelsPerInch page*.jpg
-adjoin intermediate.tiff
(You only need to specify the resolution with density
if the files don’t include resolution metadata.)
Then convert the intermediate to PDF using JPEG compression:
tiff2pdf -r 150 -j -o output.pdf intermediate.tiff
As an example, I used this technique to convert four scanned pages (JPEG format, sRGB color, 1216×1704 pixels) totalling 990098 bytes to a 651149 bytes PDF file.