pdfminer

Extract metadata info from online pdf using pdfminer in python

Extract metadata info from online pdf using pdfminer in python Question: I am interested to find out some metadata of an online pdf using pdfminer. I am interested in extracting info such as Title, author, no of lines etc from the pdf I am trying to use a related solution discussed in- https://stackoverflow.com/a/60151816/15143974 Which uses …

Total answers: 2

Assign part of a string to list

Assign part of a string to list Question: I’m trying to extract text from a PDF using PDFminer.six, is there a way to find all instances of a certain phrase appearing in that string. I know a way to find the phrases and remove them but I can’t seem to save the text around the …

Total answers: 1

Extract first page of pdf file using pdfminer library of python3

Extract first page of pdf file using pdfminer library of python3 Question: I want to get the first page data from pdffile. I have used pdfminer and got all the data of pdffile in output but i only wants to fetch the first page data of pdffile. what should i do? My code is given …

Total answers: 1

How can I extract font color of text within a PDF in Python with PDFMiner?

How can I extract font color of text within a PDF in Python with PDFMiner? Question: How can I extract font color from text within a PDF? I already tried to explore LTText or LTChar objects using PDFMiner, but it seems that this module only allows to extract font size and style, not color. Asked …

Total answers: 3

How to check if PDF is scanned image or contains text

How to check if PDF is scanned image or contains text Question: I have a large number of files, some of them are scanned images into PDF and some are full/partial text PDF. Is there a way to check these files to ensure that we are only processing files which are scanned images and not …

Total answers: 12