Removing Paragraph From Cell In Python-Docx
Question:
I am attempting to create a table with a two row header that uses a simple template format for all of the styling. The two row header is required because I have headers that are the same under two primary categories. It appears that the only way to handle this within Word so that a document will format and flow with repeating header across pages is to nest a two row table into the header row of a main content table.
In Python-DocX a table cell is always created with a single empty paragraph element. For my use case I need to be able to remove this empty paragraph element entirely not simply clear it with an empty string. Or else I have line break above my nested table that ruins my illusion of a single table.
So the question is how do you remove the empty paragraph?
If you know of a better way to handle the two row header implementation… that would also be appreciated info.
Answers:
While Paragraph.delete()
is not implemented yet in python-docx
, there is a workaround function documented here: https://github.com/python-openxml/python-docx/issues/33#issuecomment-77661907
Note that a table cell must always end with a paragraph. So you’ll need to add an empty one after your table otherwise I believe you’ll get a so-called “repair-step” error when you try to load the document.
Probably worth a try without the extra paragraph just to confirm; I’m expect it would look better without it, but last time I tried that I got the error.
As @scanny said before, it can delete the current graph if pass the p to self-defined delete function.
I just want to do a supplement, in case if you want to delete multiple paragraphs.
def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
paragraph._p = paragraph._element = None
def remove_multiple_para(doc):
i = 0
while i < len(doc.paragraphs):
if 'xxxx' in doc.paragraphs[i].text:
for j in range(i+2, i-2, -1):
# delete the related 4 lines
delete_paragraph(doc.paragraphs[j])
i += 1
doc.save('outputDoc.docx')
doc = docx.Document('./inputDoc.docx')
remove_multiple_para(doc)
I am attempting to create a table with a two row header that uses a simple template format for all of the styling. The two row header is required because I have headers that are the same under two primary categories. It appears that the only way to handle this within Word so that a document will format and flow with repeating header across pages is to nest a two row table into the header row of a main content table.
In Python-DocX a table cell is always created with a single empty paragraph element. For my use case I need to be able to remove this empty paragraph element entirely not simply clear it with an empty string. Or else I have line break above my nested table that ruins my illusion of a single table.
So the question is how do you remove the empty paragraph?
If you know of a better way to handle the two row header implementation… that would also be appreciated info.
While Paragraph.delete()
is not implemented yet in python-docx
, there is a workaround function documented here: https://github.com/python-openxml/python-docx/issues/33#issuecomment-77661907
Note that a table cell must always end with a paragraph. So you’ll need to add an empty one after your table otherwise I believe you’ll get a so-called “repair-step” error when you try to load the document.
Probably worth a try without the extra paragraph just to confirm; I’m expect it would look better without it, but last time I tried that I got the error.
As @scanny said before, it can delete the current graph if pass the p to self-defined delete function.
I just want to do a supplement, in case if you want to delete multiple paragraphs.
def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
paragraph._p = paragraph._element = None
def remove_multiple_para(doc):
i = 0
while i < len(doc.paragraphs):
if 'xxxx' in doc.paragraphs[i].text:
for j in range(i+2, i-2, -1):
# delete the related 4 lines
delete_paragraph(doc.paragraphs[j])
i += 1
doc.save('outputDoc.docx')
doc = docx.Document('./inputDoc.docx')
remove_multiple_para(doc)