Python get number of pages from password protected pdf
Question:
I’ve been trying to figure out a way to get the number of pages from password protected PDF with Python 3. So far I have tried modules pypdf and pdfminer2.
Both are failing because the file is not decrypted.
from pypdf import PdfReader
reader = PdfReader("encrypted.pdf")
print(len(reader.pages))
This code will produce an Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pypdf/_page.py", line 2155, in __len__
return self.length_function()
^^^^^^^^^^^^^^^^^^^^^^
File "pypdf/_reader.py", line 449, in _get_num_pages
return self.trailer[TK.ROOT]["/Pages"]["/Count"] # type: ignore
~~~~~~~~~~~~^^^^^^^^^
File "pypdf/generic/_data_structures.py", line 291, in __getitem__
return dict.__getitem__(self, key).get_object()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pypdf/generic/_base.py", line 290, in get_object
obj = self.pdf.get_object(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pypdf/_reader.py", line 1359, in get_object
raise FileNotDecryptedError("File has not been decrypted")
pypdf.errors.FileNotDecryptedError: File has not been decrypted
Is there a way to get the number of pages without decrypting?
Answers:
The following worked for me:
from pypdf import PdfReader
reader = PdfReader('path/to/file.pdf')
reader.decrypt(password)
print(len(reader.pages))
I would recommend removing the read-protection with a command-line tool such as qpdf (easily installable, e.g. on Ubuntu use apt-get install qpdf
if you don’t have it already):
qpdf --password=PASSWORD --decrypt SECURED.pdf UNSECURED.pdf
Then open the unlocked file with pdfminer
and do your stuff.
You can use pdfrw
Example,
a.pdf and b.pdf are same pdf. Difference is b.pdf is password protected pdf and a.pdf is simple pdf without any protection and no of pages are 30
>>> from pdfrw import PdfReader
>>> print len(PdfReader('b.pdf').pages)
30
>>> print len(PdfReader('a.pdf').pages)
30
For install use following command
pip install pdfrw
For in detail PDFRW
I’ve been trying to figure out a way to get the number of pages from password protected PDF with Python 3. So far I have tried modules pypdf and pdfminer2.
Both are failing because the file is not decrypted.
from pypdf import PdfReader
reader = PdfReader("encrypted.pdf")
print(len(reader.pages))
This code will produce an Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pypdf/_page.py", line 2155, in __len__
return self.length_function()
^^^^^^^^^^^^^^^^^^^^^^
File "pypdf/_reader.py", line 449, in _get_num_pages
return self.trailer[TK.ROOT]["/Pages"]["/Count"] # type: ignore
~~~~~~~~~~~~^^^^^^^^^
File "pypdf/generic/_data_structures.py", line 291, in __getitem__
return dict.__getitem__(self, key).get_object()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pypdf/generic/_base.py", line 290, in get_object
obj = self.pdf.get_object(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pypdf/_reader.py", line 1359, in get_object
raise FileNotDecryptedError("File has not been decrypted")
pypdf.errors.FileNotDecryptedError: File has not been decrypted
Is there a way to get the number of pages without decrypting?
The following worked for me:
from pypdf import PdfReader
reader = PdfReader('path/to/file.pdf')
reader.decrypt(password)
print(len(reader.pages))
I would recommend removing the read-protection with a command-line tool such as qpdf (easily installable, e.g. on Ubuntu use apt-get install qpdf
if you don’t have it already):
qpdf --password=PASSWORD --decrypt SECURED.pdf UNSECURED.pdf
Then open the unlocked file with pdfminer
and do your stuff.
You can use pdfrw
Example,
a.pdf and b.pdf are same pdf. Difference is b.pdf is password protected pdf and a.pdf is simple pdf without any protection and no of pages are 30
>>> from pdfrw import PdfReader
>>> print len(PdfReader('b.pdf').pages)
30
>>> print len(PdfReader('a.pdf').pages)
30
For install use following command
pip install pdfrw
For in detail PDFRW