how to read pdf file in python without converting it in unix?
Question:
pdfile=open("tutorial.pdf","r")
xyz= pdfile.readlines()
pqr=pdfile.readline()
for a in xyz:
print a
this code doesnot display actual content. Instead it displays some question marks and boxes.
Answers:
A PDF file is not plain text – you can’t just print its bytes to the terminal. You’d need to use a PDF-reading library (see Python PDF library for some suggestions) to read it.
If you are working with textual PDF files, I would suggest using PDFMiner.
(A complete example can be found here: https://github.com/syllabs/pdf2text)
PDF files contain formatted data, you cannot read directly,
so use pypdf module!
click here https://pypi.org/project/pypdf/
Install and you can read without converting.
pdfile=open("tutorial.pdf","r")
xyz= pdfile.readlines()
pqr=pdfile.readline()
for a in xyz:
print a
this code doesnot display actual content. Instead it displays some question marks and boxes.
A PDF file is not plain text – you can’t just print its bytes to the terminal. You’d need to use a PDF-reading library (see Python PDF library for some suggestions) to read it.
If you are working with textual PDF files, I would suggest using PDFMiner.
(A complete example can be found here: https://github.com/syllabs/pdf2text)
PDF files contain formatted data, you cannot read directly,
so use pypdf module!
click here https://pypi.org/project/pypdf/
Install and you can read without converting.