Removing Paragraph From Cell In Python-Docx

Question:

I am attempting to create a table with a two row header that uses a simple template format for all of the styling. The two row header is required because I have headers that are the same under two primary categories. It appears that the only way to handle this within Word so that a document will format and flow with repeating header across pages is to nest a two row table into the header row of a main content table.

In Python-DocX a table cell is always created with a single empty paragraph element. For my use case I need to be able to remove this empty paragraph element entirely not simply clear it with an empty string. Or else I have line break above my nested table that ruins my illusion of a single table.

So the question is how do you remove the empty paragraph?

If you know of a better way to handle the two row header implementation… that would also be appreciated info.

Asked By: carruthd

||

Answers:

While Paragraph.delete() is not implemented yet in python-docx, there is a workaround function documented here: https://github.com/python-openxml/python-docx/issues/33#issuecomment-77661907

Note that a table cell must always end with a paragraph. So you’ll need to add an empty one after your table otherwise I believe you’ll get a so-called “repair-step” error when you try to load the document.

Probably worth a try without the extra paragraph just to confirm; I’m expect it would look better without it, but last time I tried that I got the error.

Answered By: scanny

As @scanny said before, it can delete the current graph if pass the p to self-defined delete function.

I just want to do a supplement, in case if you want to delete multiple paragraphs.

def delete_paragraph(paragraph):
    p = paragraph._element
    p.getparent().remove(p)
    paragraph._p = paragraph._element = None

def remove_multiple_para(doc):
    
    i = 0
    while i < len(doc.paragraphs):
        if 'xxxx' in doc.paragraphs[i].text:
            for j in range(i+2, i-2, -1):
                # delete the related 4 lines
                delete_paragraph(doc.paragraphs[j])
        i += 1
    doc.save('outputDoc.docx')

doc = docx.Document('./inputDoc.docx')
remove_multiple_para(doc)
Answered By: Jie Yin
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.