How do I delete particular pages from a DOCX file?

Question:

I have quite a large collection of DOCX documents, and I need to delete all but the first page in all of them. From what I have read, docx-python does not support this since it has no notion of pages. One option I have considered is converting to PDF, deleting the pages, and converting back to DOCX, but I am concerned this will break the formatting somewhat not to mention probably be slow for so many documents. What is my best option here?

Something like:

for page in pages[1:]:
    del page
Asked By: AmanKP

||

Answers:

You cannot delete particular pages from a DOCX file at the data level alone because you cannot even reliably reference pages at the data level.

You’ll have to change your access model away from depending upon pagination, or hack a solution based on Word Automation with its licensing and server operation limitations. Moving to a non-page-based reference model such as one based on paragraphs or sections is your best option. Such models are more compatible with modern content management requirements across devices with widely varying display sizes anyway.

Answered By: kjhughes