How to access data from pdf forms with python?

Question:

I need to access data from pdf form fields. I tried the package PyPDF2 with this code:

import PyPDF2

reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())

But this gives me only the text of the normal pdf data, not the form fields.

Does anyone know how to read text from the form fields?

Asked By: Antonio Kallai

||

Answers:

There are library in python through which you can access pdf data. As pdf is not a raw data like csv, txt,tsv etc. So python can’t directly read data inside pdf files.

There is a python library name as slate Slate documentation. Read this documentation. I hope you will get answer to your question.

Answered By: Anurag Misra

You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:

from PyPDF2 import PdfFileReader

infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))

dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)
Answered By: tromar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.