Why does pyPdf2.PdfFileReader() require a file object as an input?
Question:
csv.reader()
doesn’t require a file object, nor does open()
. Does pyPdf2.PdfFileReader()
require a file object because of the complexity of the PDF format, or is there some other reason?
Answers:
It’s just a matter of how the library was written. csv.reader
allows any iterable that returns strings (which includes files). open
is opening the file, so of course it doesn’t take an open file (although it can take an integer pointing at an open file descriptor). Typically, it is better to handle the file separately, usually within a with
block so that it is closed properly.
with open('input.pdf', 'rb') as f:
# do something with the file
pypdf
can take a BytesIO stream or a file path as well. I actually recommend passing the file path in most cases as pypdf will then take care of closing the file for you.
csv.reader()
doesn’t require a file object, nor does open()
. Does pyPdf2.PdfFileReader()
require a file object because of the complexity of the PDF format, or is there some other reason?
It’s just a matter of how the library was written. csv.reader
allows any iterable that returns strings (which includes files). open
is opening the file, so of course it doesn’t take an open file (although it can take an integer pointing at an open file descriptor). Typically, it is better to handle the file separately, usually within a with
block so that it is closed properly.
with open('input.pdf', 'rb') as f:
# do something with the file
pypdf
can take a BytesIO stream or a file path as well. I actually recommend passing the file path in most cases as pypdf will then take care of closing the file for you.