How can I create a Word document using Python?

Question:

I’d like to create a Word document using Python, however, I want to re-use as much of my existing document-creation code as possible. I am currently using an XSLT to generate an HTML file that I programatically convert to a PDF file. However, my client is now requesting that the same document be made available in Word (.doc) format.

So far, I haven’t had much luck finding any solutions to this problem. Is anyone aware of an open source library (or *gulp* a proprietary solution) that may help resolve this issue?

NOTE: All possible solutions must run on Linux. I believe this eliminates pywin32.

Asked By: Huuuze

||

Answers:

Can you write is as the WordML XML files and zip it up into the .docx format? All your client would need is the Word 2007 filter if they aren’t on Office 2007 already.

There are many examples out there.

You can also load XML directly into Word, starting with 2003, or so I’ve been told.

Answered By: lavinio

A couple ways you can create Word documents using Python:

EDIT:

Since COM is out of the question, I suggest the following (inspired by @kcrumley’s answer):

Using the UNO library to automate Open Office from python, open the HTML file in OOWriter, then save as .doc.

EDIT2:

There is now a pure Python python-docx project that looks nice (I have not used it).

Answered By: codeape

1) If you want to just stick another step on the end of your current pipeline, there are several options out there now for converting PDF files to Word files. I haven’t tried 123PDFConverter, but the CNET Editors recommend it (same link); it has a free trial; and it supports automation. As with any 3rd-party file converter, your mileage may vary, depending how complicated your PDFs are, and how good the software actually is.

2) Building on codeape’s COM automation suggestion, if you COM automate Word, you can open your actual HTML file in Word, and call the “Save As” command, to save it as a DOC file.

Answered By: Kevin Crumley

I have had to do something similar with python as well. It is far more manual work than I want, but documents created with pyRTF were causing Word and OpenOffice to crash and I didn’t have the motivation to try to figure it out.

I have found it simplest (but not ideal) to create a Word document template with the styles I want. Then my Python creates an HTML file whose <p> styles are labeled after the Word styles. Then I open the HTML file in Word and open the template in Word. I cut and paste all text from the HTML file into the template, and Word re-formats it all according to the styles I had set up previously. That works for the occasional file in my situation. It might not work for your situation. FYI.

Answered By: Q R

I tried python-docx with succes, it enables you to make and edit docx within Python

Answered By: mbk
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.