how to read pdf file in python without converting it in unix?

Question:

pdfile=open("tutorial.pdf","r")
xyz= pdfile.readlines()
pqr=pdfile.readline()
for a in xyz:
    print a

this code doesnot display actual content. Instead it displays some question marks and boxes.

Asked By: user2659761

||

Answers:

A PDF file is not plain text – you can’t just print its bytes to the terminal. You’d need to use a PDF-reading library (see Python PDF library for some suggestions) to read it.

Answered By: RichieHindle

If you are working with textual PDF files, I would suggest using PDFMiner.
(A complete example can be found here: https://github.com/syllabs/pdf2text)

Answered By: user1498724

PDF files contain formatted data, you cannot read directly,

so use pypdf module!
click here https://pypi.org/project/pypdf/
Install and you can read without converting.

Answered By: no1
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.