Why does pyPdf2.PdfFileReader() require a file object as an input?

Question:

csv.reader() doesn’t require a file object, nor does open(). Does pyPdf2.PdfFileReader() require a file object because of the complexity of the PDF format, or is there some other reason?

Asked By: Zev Averbach

||

Answers:

It’s just a matter of how the library was written. csv.reader allows any iterable that returns strings (which includes files). open is opening the file, so of course it doesn’t take an open file (although it can take an integer pointing at an open file descriptor). Typically, it is better to handle the file separately, usually within a with block so that it is closed properly.

with open('input.pdf', 'rb') as f:
    # do something with the file
Answered By: davidism

pypdf can take a BytesIO stream or a file path as well. I actually recommend passing the file path in most cases as pypdf will then take care of closing the file for you.

Answered By: Martin Thoma
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.