Reading/Writing MS Word files in Python

Question:

Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:

f = open('c:file.doc', "w")
f.write(text)
f.close()

but Word will read it as an HTML file not a native .doc file.

Asked By: UnkwnTech

||

Answers:

doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don’t see how you could read and write to a doc file without some type of COM component interface.

Answered By: fuentesjr

I’d look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.

Answered By: auramo

See python-docx, its official documentation is available here.

This has worked very well for me.

Note that this does not work for .doc files, only .docx files.

Answered By: Damian

If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:

Answered By: markling
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.