How to extract text using PyPDF2 without the verbose output
Question:
I want to copy the contents from a PDF into a text file. I am able to extract the text using the following code:
from PyPDF2 import PdfReader
infile = open("input.pdf", 'rb')
reader = PdfReader(infile)
for i in reader.pages:
text = i.extract_text()
...
However, I do not need the text to be output to the terminal.Is there a way to tell the method to not output it to the terminal? I could not see anything in the documentation for the method.
Update: Silly me, I was printing the PageObject later down in the code. That caused me to think the output was coming from the extract_text()
method.
Answers:
The code snipped you posted doesn’t output the results to the terminal for me on Windows without doing an additional step:
print(text)
So I assume the verbose output happens somewhere beyond the last line of your code example:
text = i.extract_text()
That being said, PyPDF2
is being sunset and the development continues as PyPDF
, a new Python package (https://pypi.org/project/pypdf/). I’d suggest trying the new package and checking if the verbose output persists.
Here is an adopted code for the new package based on your original one:
pip install pypdf
.
from pypdf import PdfReader
# Read the file utilizing the PyPDF library.
reader = PdfReader("input.pdf")
# Obtain the total number of pages as an integer
pages = len(reader.pages)
# Extract the text from each page
for page, _ in enumerate(range(pages)):
read = reader.pages[page]
text = read.extract_text()
I want to copy the contents from a PDF into a text file. I am able to extract the text using the following code:
from PyPDF2 import PdfReader
infile = open("input.pdf", 'rb')
reader = PdfReader(infile)
for i in reader.pages:
text = i.extract_text()
...
However, I do not need the text to be output to the terminal.Is there a way to tell the method to not output it to the terminal? I could not see anything in the documentation for the method.
Update: Silly me, I was printing the PageObject later down in the code. That caused me to think the output was coming from the extract_text()
method.
The code snipped you posted doesn’t output the results to the terminal for me on Windows without doing an additional step:
print(text)
So I assume the verbose output happens somewhere beyond the last line of your code example:
text = i.extract_text()
That being said, PyPDF2
is being sunset and the development continues as PyPDF
, a new Python package (https://pypi.org/project/pypdf/). I’d suggest trying the new package and checking if the verbose output persists.
Here is an adopted code for the new package based on your original one:
pip install pypdf
.
from pypdf import PdfReader
# Read the file utilizing the PyPDF library.
reader = PdfReader("input.pdf")
# Obtain the total number of pages as an integer
pages = len(reader.pages)
# Extract the text from each page
for page, _ in enumerate(range(pages)):
read = reader.pages[page]
text = read.extract_text()