How can I decrypt a PDF using PyPDF2?
Question:
Currently I am using the PyPDF2 as a dependency.
I have encountered some encrypted files and handled
them as you normally would (in the following code):
from PyPDF2 import PdfReader
reader = PdfReader(pdf_filepath)
if reader.is_encrypted:
reader.decrypt("")
print(len(reader.pages))
My filepath looks something like "~/blah/FDJKL492019 21490 ,LFS.pdf"
PDF.decrypt("") returns 1, which means it was successful. But when it hits print PDF.getNumPages(),
it still raises the error, "PyPDF2.utils.PdfReadError: File has not been decrypted".
How do I get rid of this error?
I can open the PDF file just fine by double click (which default-opens with Adobe Reader).
Answers:
To Answer My Own Question:
If you have ANY spaces in your file name, then PyPDF 2 decrypt function will ultimately fail despite returning a success code.
Try to stick to underscores when naming your PDFs before you run them through PyPDF2.
For example,
Rather than “FDJKL492019 21490 ,LFS.pdf” do something like “FDJKL492019_21490_,LFS.pdf”.
This error may come about due to 128-bit AES encryption on the pdf, see Query – is there a way to bypass security restrictions on a pdf?
One workaround is to decrypt all isEncrypted pdfs with “qpdf”
qpdf --password='' --decrypt input.pdf output.pdf
Even if your PDF does not appear password protected, it may still be encrypted with no password. The above snippet assumes this is the case.
It has nothing to do with whether the file has been decrypted or not when using the method getNumPages()
.
If we take a look at the source code of getNumPages()
:
def getNumPages(self):
"""
Calculates the number of pages in this PDF file.
:return: number of pages
:rtype: int
:raises PdfReadError: if file is encrypted and restrictions prevent
this action.
"""
# Flattened pages will not work on an Encrypted PDF;
# the PDF file's page count is used in this case. Otherwise,
# the original method (flattened page count) is used.
if self.isEncrypted:
try:
self._override_encryption = True
self.decrypt('')
return self.trailer["/Root"]["/Pages"]["/Count"]
except:
raise utils.PdfReadError("File has not been decrypted")
finally:
self._override_encryption = False
else:
if self.flattenedPages == None:
self._flatten()
return len(self.flattenedPages)
we will notice that it is the self.isEncrypted
property controlling the flow. And as we all know the isEncrypted
property is read-only and not changeable even when the pdf is decrypted.
So, the easy way to handle the situation is just add the password as key-word argument with empty string as default value and pass your password when using the getNumPages()
method and any other method build beyond it
The following code could solve this problem:
import os
from PyPDF2 import PdfReader
filename = "example.pdf"
reader = PdfReader(filename)
if reader.is_encrypted:
try:
reader.decrypt("")
print("File Decrypted (PyPDF2)")
except:
command = (
"cp "
+ filename
+ " temp.pdf; qpdf --password='' --decrypt temp.pdf "
+ filename
+ "; rm temp.pdf"
)
os.system(command)
print("File Decrypted (qpdf)")
reader = PdfReader(filename)
else:
print("File Not Encrypted")
You can try PyMuPDF
package, it can open encrypted files and solved my problems.
Reference: PyMuPDF Documentation
I have a similar error issue with PyPDF2
, please try on pikepdf
package!
Code:
from pikepdf import Pdf
import os
filename = input('Please enter file name without file extension!')
for file in os.listdir():
if file.endswith(".pdf") and file == filename + '.pdf':
in_pdf = Pdf.open(file)
for i, page in enumerate(in_pdf.pages):
out_pdf = Pdf.new()
out_pdf.pages.append(page)
out_pdf.save("split-page%s.pdf" % i)
Currently I am using the PyPDF2 as a dependency.
I have encountered some encrypted files and handled
them as you normally would (in the following code):
from PyPDF2 import PdfReader
reader = PdfReader(pdf_filepath)
if reader.is_encrypted:
reader.decrypt("")
print(len(reader.pages))
My filepath looks something like "~/blah/FDJKL492019 21490 ,LFS.pdf"
PDF.decrypt("") returns 1, which means it was successful. But when it hits print PDF.getNumPages(),
it still raises the error, "PyPDF2.utils.PdfReadError: File has not been decrypted".
How do I get rid of this error?
I can open the PDF file just fine by double click (which default-opens with Adobe Reader).
To Answer My Own Question:
If you have ANY spaces in your file name, then PyPDF 2 decrypt function will ultimately fail despite returning a success code.
Try to stick to underscores when naming your PDFs before you run them through PyPDF2.
For example,
Rather than “FDJKL492019 21490 ,LFS.pdf” do something like “FDJKL492019_21490_,LFS.pdf”.
This error may come about due to 128-bit AES encryption on the pdf, see Query – is there a way to bypass security restrictions on a pdf?
One workaround is to decrypt all isEncrypted pdfs with “qpdf”
qpdf --password='' --decrypt input.pdf output.pdf
Even if your PDF does not appear password protected, it may still be encrypted with no password. The above snippet assumes this is the case.
It has nothing to do with whether the file has been decrypted or not when using the method getNumPages()
.
If we take a look at the source code of getNumPages()
:
def getNumPages(self):
"""
Calculates the number of pages in this PDF file.
:return: number of pages
:rtype: int
:raises PdfReadError: if file is encrypted and restrictions prevent
this action.
"""
# Flattened pages will not work on an Encrypted PDF;
# the PDF file's page count is used in this case. Otherwise,
# the original method (flattened page count) is used.
if self.isEncrypted:
try:
self._override_encryption = True
self.decrypt('')
return self.trailer["/Root"]["/Pages"]["/Count"]
except:
raise utils.PdfReadError("File has not been decrypted")
finally:
self._override_encryption = False
else:
if self.flattenedPages == None:
self._flatten()
return len(self.flattenedPages)
we will notice that it is the self.isEncrypted
property controlling the flow. And as we all know the isEncrypted
property is read-only and not changeable even when the pdf is decrypted.
So, the easy way to handle the situation is just add the password as key-word argument with empty string as default value and pass your password when using the getNumPages()
method and any other method build beyond it
The following code could solve this problem:
import os
from PyPDF2 import PdfReader
filename = "example.pdf"
reader = PdfReader(filename)
if reader.is_encrypted:
try:
reader.decrypt("")
print("File Decrypted (PyPDF2)")
except:
command = (
"cp "
+ filename
+ " temp.pdf; qpdf --password='' --decrypt temp.pdf "
+ filename
+ "; rm temp.pdf"
)
os.system(command)
print("File Decrypted (qpdf)")
reader = PdfReader(filename)
else:
print("File Not Encrypted")
You can try PyMuPDF
package, it can open encrypted files and solved my problems.
Reference: PyMuPDF Documentation
I have a similar error issue with PyPDF2
, please try on pikepdf
package!
Code:
from pikepdf import Pdf
import os
filename = input('Please enter file name without file extension!')
for file in os.listdir():
if file.endswith(".pdf") and file == filename + '.pdf':
in_pdf = Pdf.open(file)
for i, page in enumerate(in_pdf.pages):
out_pdf = Pdf.new()
out_pdf.pages.append(page)
out_pdf.save("split-page%s.pdf" % i)