Make last paragraph – active pointer
Question:
I am trying to write my research work in markdown, and it’s required by my institution to submit it in Word doc format. I decided to use python-docx
package to automate this task.
However, I am struggling with some specific task, like adding data to end of the file?
So here I am now.
def merge(docx, files):
""" merges other docx files into parent docx document """
docx._body.clear_content()
elements = []
for idx, file in enumerate(files):
donor = Document(file)
donor.add_page_break()
for element in donor.element.body:
elements.append(element)
for element in elements:
docx.element.body.append(element)
# base styles
document = Document("docx/base.docx")
# adding two preformatted files with really fragile formatting.
merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])
document.add_paragraph("hey")
document.save("tmp_result.docx")
So what I have in tmp_result.docx
, is hey -> content from 1st file, content from 2nd File
.
I checked code and was successfully able to use insert_paragraf_after
*, which added a paragraph at the end of the file.
So here is a question – how can I ask/trick document object to use the last paragraph as current element pointer? Its default behavior supposes to work, but I have change structure with my merged documents, and new content added to the first paragraph of the file.
I tried the next trick, but the result was unexpectedly unsatisfying**, after which, I decided to stop playing with the API (both word
and python-docx
) I don’t understand.
# trick I use to move active paragraph to the end.
def merge(docx, files):
docx._body.clear_content()
elements = []
for idx, file in enumerate(files):
donor = Document(file)
donor.add_page_break()
for element in donor.element.body:
elements.append(element)
for element in elements:
# moving last paragraph to the end of file.
tmp = docx.element.body[-1]
docx.element.body[-1] = element
docx.element.body.append(tmp)
# base styles
document = Document("docx/base.docx")
# adding two preformatted files with really fragile formatting.
merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])
document.add_paragraph("hey")
document.save("tmp_result.docx")
I wish I can spend more time digging Word specifications, and python-docx
code, but I don’t really have it. So here is a question:
How to point python-docx
to write after particular (last) paragraph?
ANSWER/SOLUTION credited to scanny
The problem with just appending to the body element is there is a "sentinel" sectPr element at the end of the body and it needs to stay there (like not have paragraphs after it). by @scanny
Having this valuable info, I did next thing.
def merge(docx, files):
"""
Merge existing docx files into docx.
"""
docx._body.clear_content()
elements = []
for idx, file in enumerate(files):
donor = Document(file)
donor.add_page_break()
# all except donor sentinel sectPr
for element in donor.element.body[:-1]:
elements.append(element)
# moving docx centinel to the end and adding elements from
# donors
for element in elements:
tmp = docx.element.body[-1]
docx.element.body[-1] = element
docx.element.body.append(tmp)
if __name__ == "__main__":
# addyng title page and preformatted docs files.
document = Document("docx/base.docx")
merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])
# document.add_paragraph("hey")
# open for tests
# os.system("kill -9 $(ps -e -o pid,args | grep Word.app | awk '{print $1}' | head -1)")
# this part accepts curent document
# transform markdown files that fits to pattern by adding them
# to the docx
# save and open document.
Builder(document).build("texts/13*.md").save("tmp_result.docx").open()
As results Content of the 1st file
-> Content of the 2nd File
-> Markdown generated content
Win!Win!Win!
*
You wouldn’t find method insert_paragraf_after
in the package, but it’s exactly same as insert_paragraf_before
with the only difference that paragraph created and inserted to the next ones (see method add_p_before
of CT_P class, you can use addnext
of BaseOxmlElement)).
**
result of moving current pointer p was next: content from 1st file -> hey -> content from 2nd File
, which doesn’t make sense (since I don’t really know API’s of Word and python-docx).
Answers:
Well, I’m not sure I understand exactly what you’re trying to do, but I think what you’re asking for is this:
last_p_in_document = document.paragraphs[-1]._p
p.addnext(new_p)
last_p_in_document = new_p
# ---etc.---
The problem with just appending to the body element is there is a “sentinel” sectPr
element at the end of the body and it needs to stay there (like not have paragraphs after it). The other approach you could take would be to find that element using sectPr = body[-1]
and then use sectPr.addprevious(next_element_to_be_added)
which actually seems like the simpler approach. The sectPr
will continue to be the last child of body
(so you don’t have to reset it after every element insertion) and you can add table elements as well as paragraph elements with the same code.
I am trying to write my research work in markdown, and it’s required by my institution to submit it in Word doc format. I decided to use python-docx
package to automate this task.
However, I am struggling with some specific task, like adding data to end of the file?
So here I am now.
def merge(docx, files):
""" merges other docx files into parent docx document """
docx._body.clear_content()
elements = []
for idx, file in enumerate(files):
donor = Document(file)
donor.add_page_break()
for element in donor.element.body:
elements.append(element)
for element in elements:
docx.element.body.append(element)
# base styles
document = Document("docx/base.docx")
# adding two preformatted files with really fragile formatting.
merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])
document.add_paragraph("hey")
document.save("tmp_result.docx")
So what I have in tmp_result.docx
, is hey -> content from 1st file, content from 2nd File
.
I checked code and was successfully able to use insert_paragraf_after
*, which added a paragraph at the end of the file.
So here is a question – how can I ask/trick document object to use the last paragraph as current element pointer? Its default behavior supposes to work, but I have change structure with my merged documents, and new content added to the first paragraph of the file.
I tried the next trick, but the result was unexpectedly unsatisfying**, after which, I decided to stop playing with the API (both word
and python-docx
) I don’t understand.
# trick I use to move active paragraph to the end.
def merge(docx, files):
docx._body.clear_content()
elements = []
for idx, file in enumerate(files):
donor = Document(file)
donor.add_page_break()
for element in donor.element.body:
elements.append(element)
for element in elements:
# moving last paragraph to the end of file.
tmp = docx.element.body[-1]
docx.element.body[-1] = element
docx.element.body.append(tmp)
# base styles
document = Document("docx/base.docx")
# adding two preformatted files with really fragile formatting.
merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])
document.add_paragraph("hey")
document.save("tmp_result.docx")
I wish I can spend more time digging Word specifications, and python-docx
code, but I don’t really have it. So here is a question:
How to point python-docx
to write after particular (last) paragraph?
ANSWER/SOLUTION credited to scanny
The problem with just appending to the body element is there is a "sentinel" sectPr element at the end of the body and it needs to stay there (like not have paragraphs after it). by @scanny
Having this valuable info, I did next thing.
def merge(docx, files):
"""
Merge existing docx files into docx.
"""
docx._body.clear_content()
elements = []
for idx, file in enumerate(files):
donor = Document(file)
donor.add_page_break()
# all except donor sentinel sectPr
for element in donor.element.body[:-1]:
elements.append(element)
# moving docx centinel to the end and adding elements from
# donors
for element in elements:
tmp = docx.element.body[-1]
docx.element.body[-1] = element
docx.element.body.append(tmp)
if __name__ == "__main__":
# addyng title page and preformatted docs files.
document = Document("docx/base.docx")
merge(document, ["docx/Tytulka.docx", "docx/Zavdania.docx"])
# document.add_paragraph("hey")
# open for tests
# os.system("kill -9 $(ps -e -o pid,args | grep Word.app | awk '{print $1}' | head -1)")
# this part accepts curent document
# transform markdown files that fits to pattern by adding them
# to the docx
# save and open document.
Builder(document).build("texts/13*.md").save("tmp_result.docx").open()
As results Content of the 1st file
-> Content of the 2nd File
-> Markdown generated content
Win!Win!Win!
*
You wouldn’t find methodinsert_paragraf_after
in the package, but it’s exactly same asinsert_paragraf_before
with the only difference that paragraph created and inserted to the next ones (see methodadd_p_before
of CT_P class, you can useaddnext
of BaseOxmlElement)).**
result of moving current pointer p was next:content from 1st file -> hey -> content from 2nd File
, which doesn’t make sense (since I don’t really know API’s of Word and python-docx).
Well, I’m not sure I understand exactly what you’re trying to do, but I think what you’re asking for is this:
last_p_in_document = document.paragraphs[-1]._p
p.addnext(new_p)
last_p_in_document = new_p
# ---etc.---
The problem with just appending to the body element is there is a “sentinel” sectPr
element at the end of the body and it needs to stay there (like not have paragraphs after it). The other approach you could take would be to find that element using sectPr = body[-1]
and then use sectPr.addprevious(next_element_to_be_added)
which actually seems like the simpler approach. The sectPr
will continue to be the last child of body
(so you don’t have to reset it after every element insertion) and you can add table elements as well as paragraph elements with the same code.