text-extraction

Extract Nouns From Dataframe and Store them into another Row

Extract Nouns From Dataframe and Store them into another Row Question: I’m practicing NLP and have a problem. I have a dataset containing rows of sentences. Pos-tagging of each row was easy. Now I want to extract nouns from those rows and store them in another column in respective rows. nouns = [] tags = …

Total answers: 1

Textract Unsupported Document Exception

Textract Unsupported Document Exception Question: I’m trying to use boto3 to run a textract detect_document_text request. I’m using the following code: client = boto3.client(‘textract’) response = client.detect_document_text( Document={ ‘Bytes’: image_b64[‘document_b64′] } ) Where image_b64[‘document_b64’] is a base64 image code that I converted using, for exemplo, https://base64.guru/converter/encode/image website. But I’m getting the following error: UnsupportedDocumentException What …

Total answers: 4

How to extract longitude and latitude from a link

How to extract longitude and latitude from a link Question: From the following link, I am trying to extract the longitude and latitude. I found similar posts but not one with the same format. I’m new to regex/text manipulation and would appreciate any guidance on how I might do this using Python. The output I’d …

Total answers: 3

Extracting text from a PDF file using PDFMiner in python?

Extracting text from a PDF file using PDFMiner in python? Question: I am looking for documentation or examples on how to extract text from a PDF file using PDFMiner with Python. It looks like PDFMiner updated their API and all the relevant examples I have found contain outdated code(classes and methods have changed). The libraries …

Total answers: 6

PDF Parsing Using Python – extracting formatted and plain texts

PDF Parsing Using Python – extracting formatted and plain texts Question: I’m looking for a PDF library which will allow me to extract the text from a PDF document. I’ve looked at PyPDF, and this can extract the text from a PDF document very nicely. The problem with this is that if there are tables …

Total answers: 2

Python module for converting PDF to text

Python module for converting PDF to text Question: Is there any python module to convert PDF files into text? I tried one piece of code found in Activestate which uses pypdf but the text generated had no space between and was of no use. Asked By: cnu || Source Answers: Try PDFMiner. It can extract …

Total answers: 13