How can I decrypt a PDF using PyPDF2?

Question

Currently I am using the PyPDF2 as a dependency.

I have encountered some encrypted files and handled
them as you normally would (in the following code):

from PyPDF2 import PdfReader

reader = PdfReader(pdf_filepath)
if reader.is_encrypted:
    reader.decrypt("")
    print(len(reader.pages))

My filepath looks something like "~/blah/FDJKL492019 21490 ,LFS.pdf"
PDF.decrypt("") returns 1, which means it was successful. But when it hits print PDF.getNumPages(),
it still raises the error, "PyPDF2.utils.PdfReadError: File has not been decrypted".

How do I get rid of this error?
I can open the PDF file just fine by double click (which default-opens with Adobe Reader).

Asked By: Jin Lee

||

Source

Answer 1

To Answer My Own Question:
If you have ANY spaces in your file name, then PyPDF 2 decrypt function will ultimately fail despite returning a success code.
Try to stick to underscores when naming your PDFs before you run them through PyPDF2.

For example,

Rather than “FDJKL492019 21490 ,LFS.pdf” do something like “FDJKL492019_21490_,LFS.pdf”.

Answered By: Jin Lee

Answer 2

This error may come about due to 128-bit AES encryption on the pdf, see Query – is there a way to bypass security restrictions on a pdf?

One workaround is to decrypt all isEncrypted pdfs with “qpdf”

qpdf --password='' --decrypt input.pdf output.pdf

Even if your PDF does not appear password protected, it may still be encrypted with no password. The above snippet assumes this is the case.

Answered By: Luke Rehmann

Answer 3

It has nothing to do with whether the file has been decrypted or not when using the method getNumPages().

If we take a look at the source code of getNumPages():

def getNumPages(self):
    """
    Calculates the number of pages in this PDF file.

    :return: number of pages
    :rtype: int
    :raises PdfReadError: if file is encrypted and restrictions prevent
        this action.
    """

    # Flattened pages will not work on an Encrypted PDF;
    # the PDF file's page count is used in this case. Otherwise,
    # the original method (flattened page count) is used.
    if self.isEncrypted:
        try:
            self._override_encryption = True
            self.decrypt('')
            return self.trailer["/Root"]["/Pages"]["/Count"]
        except:
            raise utils.PdfReadError("File has not been decrypted")
        finally:
            self._override_encryption = False
    else:
        if self.flattenedPages == None:
            self._flatten()
        return len(self.flattenedPages)

we will notice that it is the self.isEncrypted property controlling the flow. And as we all know the isEncrypted property is read-only and not changeable even when the pdf is decrypted.

So, the easy way to handle the situation is just add the password as key-word argument with empty string as default value and pass your password when using the getNumPages() method and any other method build beyond it

Answered By: Zijian He

Answer 4

The following code could solve this problem:

import os

from PyPDF2 import PdfReader

filename = "example.pdf"
reader = PdfReader(filename)
if reader.is_encrypted:
    try:
        reader.decrypt("")
        print("File Decrypted (PyPDF2)")
    except:
        command = (
            "cp "
            + filename
            + " temp.pdf; qpdf --password='' --decrypt temp.pdf "
            + filename
            + "; rm temp.pdf"
        )
        os.system(command)
        print("File Decrypted (qpdf)")
        reader = PdfReader(filename)
else:
    print("File Not Encrypted")

Answered By: mathsyouth

Answer 5

You can try PyMuPDF package, it can open encrypted files and solved my problems.

Reference: PyMuPDF Documentation

Answered By: marcin

Answer 6

Implement qpdf using python with pikepdf library.

import pikepdf

pdf = pikepdf.open('unextractable.pdf')
pdf.save('extractable.pdf')

Answered By: Himanshu Gupta

Answer 7

I have a similar error issue with PyPDF2, please try on pikepdf package!

Code:

from pikepdf import Pdf
import os

filename = input('Please enter file name without file extension!')

for file in os.listdir():

    if file.endswith(".pdf") and file == filename + '.pdf':
        in_pdf = Pdf.open(file)
    
        for i, page in enumerate(in_pdf.pages):
            out_pdf = Pdf.new()
            out_pdf.pages.append(page)
            out_pdf.save("split-page%s.pdf" % i)

Answered By: aVral

How can I decrypt a PDF using PyPDF2?

Question:

Answers: