Py2PDF PdfFileWriter – Splitting PDF is appending files rather than saving own file
Question:
I have a dictionary with about 30 key/values of a name and page number. I am looping through a PDF and trying to get the page number in the dictionary and pull that page out and then save it as it’s own file.
It seems to be doing most of what I want, but rather than saving the file with it’s own page, it is keeping the previous looped file open and then adding a new page to the file and re-saving it with the new name.
How do I get the file to save each page that I loop through as it’s own file rather than appending it to the previous file?
reader = PdfFileReader(infile)
writer = PdfFileWriter()
for x, y in page_list.items():
with open(x+'.pdf', 'wb') as outfile:
writer.addPage(reader.getPage(y-1))
writer.write(outfile)
Answers:
You should re-instantiate the Writer in the loop, as shown below
reader = PdfFileReader(infile)
for x, y in page_list.items():
writer = PdfFileWriter()
with open(x+'.pdf', 'wb') as outfile:
writer.addPage(reader.getPage(y-1))
writer.write(outfile)
Keeping the writer
instantiated outside the loop, will result in splitting the pages.
Alternatively, a faster approach can be using pdftk burst input.pdf
I have a dictionary with about 30 key/values of a name and page number. I am looping through a PDF and trying to get the page number in the dictionary and pull that page out and then save it as it’s own file.
It seems to be doing most of what I want, but rather than saving the file with it’s own page, it is keeping the previous looped file open and then adding a new page to the file and re-saving it with the new name.
How do I get the file to save each page that I loop through as it’s own file rather than appending it to the previous file?
reader = PdfFileReader(infile)
writer = PdfFileWriter()
for x, y in page_list.items():
with open(x+'.pdf', 'wb') as outfile:
writer.addPage(reader.getPage(y-1))
writer.write(outfile)
You should re-instantiate the Writer in the loop, as shown below
reader = PdfFileReader(infile)
for x, y in page_list.items():
writer = PdfFileWriter()
with open(x+'.pdf', 'wb') as outfile:
writer.addPage(reader.getPage(y-1))
writer.write(outfile)
Keeping the writer
instantiated outside the loop, will result in splitting the pages.
Alternatively, a faster approach can be using pdftk burst input.pdf