Convert a PDF to text with Python

Question:

How can I get the content of pdf file line by line in python? I have searched in stackoverflow but could not find any good answer. Notes: pyPdf gives assertion erro, if possible something with slate and pdfminer.

Asked By: user873286

||

Answers:

from the command line:python /path/to/pdf2txt.py -o text.txt /path/to/yourpdf.pdf

You can then just take the text file it makes and use for line in file:

If you want to be efficient you would have to change pdf2txt.py, and have outfp be a python iostring, which would avoid the making a file and then reading from it.

Answered By: apple16
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.