Merge 2 pdf files giving me an empty pdf

Question:

I am using the following standard code:

# importing required modules
import PyPDF2

def PDFmerge(pdfs, output):
    # creating pdf file merger object
    pdfMerger = PyPDF2.PdfFileMerger()

    # appending pdfs one by one
    for pdf in pdfs:
        with open(pdf, 'rb') as f:
            pdfMerger.append(f)

    # writing combined pdf to output pdf file
    with open(output, 'wb') as f:
        pdfMerger.write(f)

def main():
    # pdf files to merge
    pdfs = ['example.pdf', 'rotated_example.pdf']

    # output pdf file name
    output  = 'combined_example.pdf'

    # calling pdf merge function
    PDFmerge(pdfs = pdfs, output = output)

if __name__ == "__main__":
    # calling the main function
    main()

But when I call this with my 2 pdf files (which just contain some text), it produces an empty pdf file, I am wondering how this may be caused?

Asked By: HolyMonk

||

Answers:

The problem is that you’re closing the files before the write.

When you call pdfMerger.append, it doesn’t actually read and process the whole file then; it only does so later, when you call pdfMerger.write. Since the files you’ve appended are closed, it reads no data from each of them, and therefore outputs an empty PDF.

This should actually raise an exception, which would have made the problem and the fix obvious. Apparently this is a bug introduced in version 1.26, and it will be fixed in the next version. Unfortunately, while the fix was implemented in July 2016, there hasn’t been a next version since May 2016. (See this issue.)

You could install directly off the github master (and hope there aren’t any new bugs), or you could continue to wait for 1.27, or you could work around the bug. How? Simple: just keep the files open until the write is done:

with contextlib.ExitStack() as stack:
    pdfMerger = PyPDF2.PdfFileMerger()
    files = [stack.enter_context(open(pdf, 'rb')) for pdf in pdfs]
    for f in files:
        pdfMerger.append(f)
    with open(output, 'wb') as f:
        pdfMerger.write(f)
Answered By: abarnert

The workaround I have found that works uses an instance of PdfFileReader as the object to append.

from PyPDF2 import PdfFileMerger
from PyPDF2 import PdfFileReader
merger = PdfFileMerger()
for f in ['file1.pdf', 'file2.pdf', 'file3.pdf']:
    merger.append(PdfFileReader(f), 'rb')
with open('finished_copy.pdf', 'wb') as new_file:
    merger.write(new_file)

Hope that helps!

Answered By: Lizzy Presland
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.