Page count after using PdfFileMerger() in pypdf2

Question:

I am trying to use PdfFileMerger() in PyPDF2 to merge pdf files (see code).

from PyPDF2 import PdfFileMerger, PdfFileReader

[...]

merger = PdfFileMerger()

if (some condition):
    merger.append(PdfFileReader(file(filename1, 'rb')))
    merger.append(PdfFileReader(file(filename2, 'rb')))
if (test for non-zero file size):
    merger.write("output.pdf")

However, my merge commands are subject to certain conditions and it could turn out that no merged pdf file is generated. I would like to know how to determine the page count after performing merges using PdfFileMerger(). If nothing else, I would like to know if the number of pages is non-zero. Maintaining a counter to do this would be cumbersome because I am performing the merges across several functions and would prefer a more elegant solution.

Asked By: arbitguy

||

Answers:

I’m +- in the same case as you. I will explain my solution. I’m not opening the PDFs with PdfFileReader('filename.pdf', 'rb') but I’m passing the pdfs content in an array for the merge (pdfs_content_array). Then I’m preparing the merger and my output (don’t want to save the generated file locally so I have to use BytesIO to save the merged content somewhere) calc_page_sum is needed to compare the page number results. The most important part is: calc_page_sum += PdfFileReader(bytes_content).getNumPages() so I open the bytes content with PdfFileReader and get the pages number. Then I’m appending the merger ... merger.append,bytes_content I’m writing the merge into my bytes output and compare it with the calc_page_sum. That’s it.

from PyPDF2 import PdfFileMerger, PdfFileReader
import io

[...]

def merge_the_pdfs(self,pdfs_content_array,output_file):
    merger = PdfFileMerger()
    output = io.BytesIO()
    calc_page_sum = 0

    for content in pdfs_content_array:
        bytes_content = io.BytesIO(content)
        calc_page_sum += PdfFileReader(bytes_content).getNumPages()
        yield self.application.cpupool.submit(merger.append,bytes_content)

    merger.write(output)
    if not calc_page_sum == PdfFileReader(output).getNumPages():
        return None

    return output.getValue()

Hope this will help!

2nd Version:

from PyPDF2 import PdfFileMerger, PdfFileReader
import io
import sys

filename1 = 'test.pdf'
filename2 = 'test1.pdf'

merger = PdfFileMerger()
output = io.BytesIO()
calc_page_sum = 0

filesarray = [filename1,filename2]

for singlefile in filesarray:
    calc_page_sum += PdfFileReader(singlefile, 'rb').getNumPages()
    merger.append(PdfFileReader(singlefile, 'rb'))

merger.write(output)
print(calc_page_sum)
print(PdfFileReader(output).getNumPages())

if calc_page_sum == PdfFileReader(output).getNumPages():
    print("It worked")
    merger.write("merging-test.pdf")
    sys.exit()

print("Didn't worked")
sys.exit()
Answered By: Blue Stork

maybe you can try to use the following

if len(merger.pages) > 0

for your condition

if (test for non-zero file size)
Answered By: CC5
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.