Python get number of pages from password protected pdf

Question:

I’ve been trying to figure out a way to get the number of pages from password protected PDF with Python 3. So far I have tried modules pypdf and pdfminer2.
Both are failing because the file is not decrypted.

from pypdf import PdfReader
reader = PdfReader("encrypted.pdf")
print(len(reader.pages))

This code will produce an Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pypdf/_page.py", line 2155, in __len__
    return self.length_function()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "pypdf/_reader.py", line 449, in _get_num_pages
    return self.trailer[TK.ROOT]["/Pages"]["/Count"]  # type: ignore
           ~~~~~~~~~~~~^^^^^^^^^
  File "pypdf/generic/_data_structures.py", line 291, in __getitem__
    return dict.__getitem__(self, key).get_object()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pypdf/generic/_base.py", line 290, in get_object
    obj = self.pdf.get_object(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pypdf/_reader.py", line 1359, in get_object
    raise FileNotDecryptedError("File has not been decrypted")
pypdf.errors.FileNotDecryptedError: File has not been decrypted

Is there a way to get the number of pages without decrypting?

Asked By: JBeardie

||

Answers:

The following worked for me:

from pypdf import PdfReader
reader = PdfReader('path/to/file.pdf')
reader.decrypt(password)
print(len(reader.pages))

I would recommend removing the read-protection with a command-line tool such as qpdf (easily installable, e.g. on Ubuntu use apt-get install qpdf if you don’t have it already):

qpdf --password=PASSWORD --decrypt SECURED.pdf UNSECURED.pdf
Then open the unlocked file with pdfminer and do your stuff.

Answered By: Deepak Paudel

You can use pdfrw

Example,

a.pdf and b.pdf are same pdf. Difference is b.pdf is password protected pdf and a.pdf is simple pdf without any protection and no of pages are 30

>>> from pdfrw import PdfReader
>>> print len(PdfReader('b.pdf').pages)
30
>>> print len(PdfReader('a.pdf').pages)
30

For install use following command

pip install pdfrw

For in detail PDFRW

Answered By: Kallz
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.