Make last paragraph – active pointer

Question:

I am trying to write my research work in markdown, and it’s required by my institution to submit it in Word doc format. I decided to use python-docx package to automate this task.

However, I am struggling with some specific task, like adding data to end of the file?

So here I am now.

def merge(docx, files):
    """ merges other docx files into parent docx document """
    docx._body.clear_content() 

    elements = []
    for idx, file in enumerate(files):
        donor = Document(file)
        donor.add_page_break()

        for element in donor.element.body:
            elements.append(element)

    for element in elements: 
        docx.element.body.append(element)

# base styles 
document = Document("docx/base.docx")

# adding two preformatted files with really fragile formatting.
merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])

document.add_paragraph("hey")
document.save("tmp_result.docx")

So what I have in tmp_result.docx, is hey -> content from 1st file, content from 2nd File.

I checked code and was successfully able to use insert_paragraf_after*, which added a paragraph at the end of the file.

So here is a question – how can I ask/trick document object to use the last paragraph as current element pointer? Its default behavior supposes to work, but I have change structure with my merged documents, and new content added to the first paragraph of the file.

I tried the next trick, but the result was unexpectedly unsatisfying**, after which, I decided to stop playing with the API (both word and python-docx) I don’t understand.

# trick I use to move active paragraph to the end.

def merge(docx, files):
    docx._body.clear_content()

    elements = []
    for idx, file in enumerate(files):
        donor = Document(file)
        donor.add_page_break()

        for element in donor.element.body:
            elements.append(element)

    for element in elements:
        # moving last paragraph to the end of file.
        tmp = docx.element.body[-1]
        docx.element.body[-1] = element
        docx.element.body.append(tmp)

# base styles 
document = Document("docx/base.docx")

# adding two preformatted files with really fragile formatting.
merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])

document.add_paragraph("hey")
document.save("tmp_result.docx")

I wish I can spend more time digging Word specifications, and python-docx code, but I don’t really have it. So here is a question:

How to point python-docx to write after particular (last) paragraph?

ANSWER/SOLUTION credited to scanny

The problem with just appending to the body element is there is a "sentinel" sectPr element at the end of the body and it needs to stay there (like not have paragraphs after it). by @scanny

Having this valuable info, I did next thing.


def merge(docx, files):
    """
    Merge existing docx files into docx.
    """
    docx._body.clear_content()

    elements = []
    for idx, file in enumerate(files):
        donor = Document(file)
        donor.add_page_break()

        # all except donor sentinel sectPr
        for element in donor.element.body[:-1]:
            elements.append(element)

    # moving docx centinel to the end and adding elements from
    # donors
    for element in elements:
        tmp = docx.element.body[-1]
        docx.element.body[-1] = element
        docx.element.body.append(tmp)


if __name__ == "__main__":

    # addyng title page and preformatted docs files.
    document = Document("docx/base.docx")
    merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])

    # document.add_paragraph("hey")

    # open for tests
    # os.system("kill -9 $(ps -e -o pid,args | grep Word.app | awk '{print $1}' | head -1)")
    # this part accepts curent document
    # transform markdown files that fits to pattern by adding them
    # to the docx
    # save and open document.
    Builder(document).build("texts/13*.md").save("tmp_result.docx").open()

As results Content of the 1st file -> Content of the 2nd File -> Markdown generated content

Win!Win!Win!


  • * You wouldn’t find method insert_paragraf_after in the package, but it’s exactly same as insert_paragraf_before with the only difference that paragraph created and inserted to the next ones (see method add_p_before of CT_P class, you can use addnext of BaseOxmlElement)).
  • ** result of moving current pointer p was next: content from 1st file -> hey -> content from 2nd File, which doesn’t make sense (since I don’t really know API’s of Word and python-docx).
Asked By: Oleg Butuzov

||

Answers:

Well, I’m not sure I understand exactly what you’re trying to do, but I think what you’re asking for is this:

last_p_in_document = document.paragraphs[-1]._p
p.addnext(new_p)
last_p_in_document = new_p
# ---etc.---

The problem with just appending to the body element is there is a “sentinel” sectPr element at the end of the body and it needs to stay there (like not have paragraphs after it). The other approach you could take would be to find that element using sectPr = body[-1] and then use sectPr.addprevious(next_element_to_be_added) which actually seems like the simpler approach. The sectPr will continue to be the last child of body (so you don’t have to reset it after every element insertion) and you can add table elements as well as paragraph elements with the same code.

Answered By: scanny
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.