how to edit/modify text in PDF
Question:
I am working on my final year project, so I working on a website where a user can come and read PDF, I am adding some features such as converting currency to their country currency, I am using flask and pymuPDF for my project and I don’t know how I can modify the text at a pdf
anyone can help me with this problem
I heard here that using pymuPDF or pypdf2 can work, but I didn’t find any solution for replacing text
Answers:
Using the redaction facility of PyMuPDF is probably the adequate thing to do.
The approach:
- Identify the location of the text to replace
- Erase the text and replace it using redactions
Care must be taken to get hold of the original font, and whether or not the new text is longer / short than the original.
import fitz # import PyMuPDF
doc = fitz.open("myfile.pdf")
page = doc[number] # page number 0-based
# suppose you want to replace all occurrences of some text
disliked = "delete this"
better = "better text"
hits = page.search_for("delete this") # list of rectangles where to replace
for rect in hit:
page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
align=fitz.TEXT_ALIGN_CENTER, ...) # more parameters
page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE) # don't touch images
doc.save("replaced.pdf", garbage=3, deflate=True)
This works well with short text and medium quality expectations.
With some more effort, the original font properties, color, font size, etc. can be identified to produce a close-to-perfect result.
I am working on my final year project, so I working on a website where a user can come and read PDF, I am adding some features such as converting currency to their country currency, I am using flask and pymuPDF for my project and I don’t know how I can modify the text at a pdf
anyone can help me with this problem
I heard here that using pymuPDF or pypdf2 can work, but I didn’t find any solution for replacing text
Using the redaction facility of PyMuPDF is probably the adequate thing to do.
The approach:
- Identify the location of the text to replace
- Erase the text and replace it using redactions
Care must be taken to get hold of the original font, and whether or not the new text is longer / short than the original.
import fitz # import PyMuPDF
doc = fitz.open("myfile.pdf")
page = doc[number] # page number 0-based
# suppose you want to replace all occurrences of some text
disliked = "delete this"
better = "better text"
hits = page.search_for("delete this") # list of rectangles where to replace
for rect in hit:
page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
align=fitz.TEXT_ALIGN_CENTER, ...) # more parameters
page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE) # don't touch images
doc.save("replaced.pdf", garbage=3, deflate=True)
This works well with short text and medium quality expectations.
With some more effort, the original font properties, color, font size, etc. can be identified to produce a close-to-perfect result.