How to access data from pdf forms with python?
Question:
I need to access data from pdf form fields. I tried the package PyPDF2 with this code:
import PyPDF2
reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())
But this gives me only the text of the normal pdf data, not the form fields.
Does anyone know how to read text from the form fields?
Answers:
There are library in python through which you can access pdf
data. As pdf
is not a raw data like csv
, txt
,tsv
etc. So python can’t directly read data inside pdf
files.
There is a python library name as slate
Slate documentation. Read this documentation. I hope you will get answer to your question.
You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:
from PyPDF2 import PdfFileReader
infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)
I need to access data from pdf form fields. I tried the package PyPDF2 with this code:
import PyPDF2
reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())
But this gives me only the text of the normal pdf data, not the form fields.
Does anyone know how to read text from the form fields?
There are library in python through which you can access pdf
data. As pdf
is not a raw data like csv
, txt
,tsv
etc. So python can’t directly read data inside pdf
files.
There is a python library name as slate
Slate documentation. Read this documentation. I hope you will get answer to your question.
You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:
from PyPDF2 import PdfFileReader
infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)