pdfminer

Python PDF read straight across as how it looks in the PDF

Python PDF read straight across as how it looks in the PDF Question: If I use the code in the answer here: Extracting text from a PDF file using PDFMiner in python? I can get the text to extract when applying to this pdf: https://www.tencent.com/en-us/articles/15000691526464720.pdf However, you see under “CONSOLIDATED INCOME STATEMENT”, it reads down …

Total answers: 5

Python – Extracting text from webpage PDF

Python – Extracting text from webpage PDF Question: So I have come across a few posts that deal with converting PDF’s to HTML or converting them to text, however they all deal with doing so from a file saved to the computer. Is there a way to extract the text from a webpage PDF without …

Total answers: 2

Extracting text from a PDF file using PDFMiner in python?

Extracting text from a PDF file using PDFMiner in python? Question: I am looking for documentation or examples on how to extract text from a PDF file using PDFMiner with Python. It looks like PDFMiner updated their API and all the relevant examples I have found contain outdated code(classes and methods have changed). The libraries …

Total answers: 6

How to extract text and text coordinates from a PDF file?

How to extract text and text coordinates from a PDF file? Question: I want to extract all the text boxes and text box coordinates from a PDF file with PDFMiner. Many other Stack Overflow posts address how to extract all text in an ordered fashion, but how can I do the intermediate step of getting …

Total answers: 3

Syntax error while installing pdfminer using python

Syntax error while installing pdfminer using python Question: I want to use the pdfminer for extracting the text info. I have downloaded the pdfminer-20131113. I have installed the python in C:python34. Now using cmd, I am setting the path to the setup.py file of pdfminer. and running the following command. python setup.py install But I …

Total answers: 3

PDFminer gives strange letters

PDFminer gives strange letters Question: I am using python2.7 and PDFminer for extracting text from pdf. I noticed that sometimes PDFminer gives me words with strange letters, but pdf viewers don’t. Also for some pdf docs result returned by PDFminer and other pdf viewers are same (strange), but there are docs where pdf viewers can …

Total answers: 2

Extract text per page with Python pdfMiner?

Extract text per page with Python pdfMiner? Question: I have experimented with both pypdf and pdfMiner to extract text from PDF files. I have some unfriendly PDFs that only pdfMiner is able to extract successfully. I am using the code here to extract text for the entire file. However, I would really like to extract …

Total answers: 2

How do I use pdfminer as a library

How do I use pdfminer as a library Question: I am trying to get text data from a pdf using pdfminer. I am able to extract this data to a .txt file successfully with the pdfminer command line tool pdf2txt.py. I currently do this and then use a python script to clean up the .txt …

Total answers: 15