Split specific pages of PDF and save it with Python

Question:

I am trying to split 20 pages of pdf file (single) , into five respective pdf files , 1st pdf contains 1-3 pages , 2nd pdf file contains only 4th page, 3rd pdf contains 5 to 10 pages, 4th pdf contains 11-17 pages , and 5th pdf contains 18-20 page . I need the working code in python. The below mentioned code splits the entire pdf file into single pages, but I want the grouped pages..

    from PyPDF2 import PdfFileWriter, PdfFileReader
    inputpdf = PdfFileReader(open("input.pdf", "rb"))
    for i in range(inputpdf.numPages):
    j = i+1    
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("page%s.pdf" % j, "wb") as outputStream:
    output.write(outputStream)
Asked By: Sutirtha Thakur

||

Answers:

if you have python 3, you can use tika according to the following answer here:

How to extract text from a PDF file?

Answered By: Sa'ad

For me it looks like task for pdfrw using this example from GitHub I written following example code:

from pdfrw import PdfReader, PdfWriter
pages = PdfReader('inputfile.pdf').pages
parts = [(3,6),(7,10)]
for part in parts:
    outdata = PdfWriter(f'pages_{part[0]}_{part[1]}.pdf')
    for pagenum in range(*part):
        outdata.addpage(pages[pagenum-1])
    outdata.write()

This one create two files: pages_3_6.pdf and pages_7_10.pdf each with 3 pages i.e. 3,4,5 and 7,8,9. Note pagenum-1 in code, that -1 is used due to fact that pdf pages numeration starts at 1 rather than 0. I also used so-called f-strings to get names of output files. In my opinion it is slick method but it is not available in Python2 and I am not sure if it is available in all Python3 versions (I tested my code in 3.6.7), so you might use old formatting method instead if you wish.
Remember to alter filenames and ranges accordingly to your needs.

Answered By: Daweo

How to extract specific pages (or split specific pages) from a PDF file and save those pages as a separate PDF using Python.

pip install PyPDF2 # to install module/package

from PyPDF2 import PdfFileReader, PdfFileWriter

pdf_file_path = 'Unknown.pdf'
file_base_name = pdf_file_path.replace('.pdf', '')

pdf = PdfFileReader(pdf_file_path)

pages = [0, 2, 4] # page 1, 3, 5
pdfWriter = PdfFileWriter()

for page_num in pages:
    pdfWriter.addPage(pdf.getPage(page_num))

with open('{0}_subset.pdf'.format(file_base_name), 'wb') as f:
    pdfWriter.write(f)
    f.close()

CREDIT :
How to extract PDF pages and save as a separate PDF file using Python

Answered By: thrinadhn
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.