How to use pypdf2 to check if a pdf password is correct

Question:

I am doing a challenge problem from Automate the Boring Stuff with Python. The task is to write a program that will “brute force” a pdf password using a provided dictionary.txt file.

Using a file I encrypted and know the password to, and a dictionary file that contains that password, I cannot get my code to “figure out” that it is the password. Instead it runs until the end of the dictionary file, then stops.

I thought I might be misunderstanding how pdfObject.decrypt() would work in an “if” statement, so I made a little test program to play with the syntax, but it works in that program using the same syntax as my “main” program. In the “test” program I provide the password from input() instead of argv and a list, but I can’t see if/how that’s affecting it.

#My Program (that blitzes right past the correct password):

#/usr/bin/python

# bruteForce.py - Uses a dictionary attack to crack the password of an
# encrypted pdf

from sys import argv
import PyPDF2

# Take the argument for the file you want to attack and open it
if len(argv) > 1:
    pdfFilename = argv[1]
else:
    print('User must specify a file.')
    exit()

# Open the pdf as an object
pdfFile = open(pdfFilename, 'rb')
pdfObject = PyPDF2.PdfFileReader(pdfFile)

# Add the contents of the dictionary file to a data structure
dictionaryList = []
dictionaryFile = open('dictionary.txt')
for word in dictionaryFile.readlines():
    dictionaryList.append(word)
    wordLower = word.lower()
    dictionaryList.append(wordLower)
dictionaryFile.close()

# Iterate over the data structure, trying each password as lowercase
print('Trying passwords...')
for word in dictionaryList:
    password = str(word)
    print('Trying ' + password)
    if pdfObject.decrypt(password) == 1:
        print('Password is ' + word)
        exit()
    else:
        continue

print('Password not found')

##My Test (that works and returns 'Yup, that's it!'):

#!/usr/bin/python
# Figuring out how to deal with foo.decrypt to get a value of 1 or 0
# This first test returns the correct result with the correct password,
# so presumably bruteForce is not feeding it the right password somehow

import PyPDF2

filename = input('What file?')
password = input('What password?')

pdfFile = open(filename, 'rb')
pdfObject = PyPDF2.PdfFileReader(pdfFile)

if pdfObject.decrypt(password) == 1:
    print('Yup, that's it!')
else:
    print('Nope!')



I expect the program to arrive at the correct word in the dictionary, try it, and stop. Instead, it runs to the end of the list and says "Password not found."
Asked By: partial_mask

||

Answers:

The dictionary entries contained the newlines from the text file, so they weren’t matching the password. I stripped them with wordStripped = word.strip('n') before adding them to the dictionary and the program worked as expected (and about twice as fast).

Answered By: partial_mask

if you use;

word_list = open("dictionary.txt").readlines()

Then the list only contains the words, you can test this by printing out each line from the list with;

The make the project work I used the .strip method to remove the ‘n’ as I iterated through the list

Answered By: Sean Massey
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.