How to extract text using PyPDF2 without the verbose output

Question

I want to copy the contents from a PDF into a text file. I am able to extract the text using the following code:

from PyPDF2 import PdfReader
infile = open("input.pdf", 'rb')
reader = PdfReader(infile)
for i in reader.pages:
    text = i.extract_text()
    ...

However, I do not need the text to be output to the terminal.Is there a way to tell the method to not output it to the terminal? I could not see anything in the documentation for the method.

Update: Silly me, I was printing the PageObject later down in the code. That caused me to think the output was coming from the extract_text() method.

Asked By: user1720897

||

Source

Answer 1

The code snipped you posted doesn’t output the results to the terminal for me on Windows without doing an additional step:

print(text)

So I assume the verbose output happens somewhere beyond the last line of your code example:

text = i.extract_text()

That being said, PyPDF2 is being sunset and the development continues as PyPDF, a new Python package (https://pypi.org/project/pypdf/). I’d suggest trying the new package and checking if the verbose output persists.

Here is an adopted code for the new package based on your original one:

pip install pypdf

.

from pypdf import PdfReader

# Read the file utilizing the PyPDF library.
reader = PdfReader("input.pdf")

# Obtain the total number of pages as an integer
pages = len(reader.pages)

# Extract the text from each page
for page, _ in enumerate(range(pages)):
    read = reader.pages[page]
    text = read.extract_text()

Answered By: Henry Wills

How to extract text using PyPDF2 without the verbose output

Question:

Answers: