Extracting images from a .RTF file with Python

Question:

does anyone know how to extract or copy images from a .rtf file ?

I have tryed to look for a solution but from what I found, all of the libraries and articles people cite no longer exist or have non-existen documentation.

Asked By: Pancake

||

Answers:

Yes it is possible, but maybe you have to use an older version of python, something like 2.7

import pyth.plugins.rtf15.reader as reader
import base64

doc = reader.Rtf15Reader(open('Document.rtf', 'rb')).read()
for element in doc.content:
    if element.__class__.__name__ == 'Image':
        image_data = base64.b64decode(element.binary_data)
        with open(f"{element.filename}", 'wb') as f:
            f.write(image_data)
Answered By: Gabriel Murilo

since I didn’t find a straightforward solution to extract images from a .rtf file I came up with a workaround.

I used the win32com lib to open the file and then saved it as a .docx:

word = win32com.client.Dispatch('Word.Application')
doc = word.Documents.Open(RtfFilePath)
doc.SaveAs(saveDocxPath, FileFormat=16)
doc.Close()
word.Quit()

This is way you can use docx2txt and other libraries that extract images from word files:

text = docx2txt.process("/path/your_word_doc.docx", '/home/example/img/')

Also I have found out that some images can be saved as .wmf, these files can’t be extarcted this way. I have found a workaround for this by using commands.

subprocess.run(f"tar -x -f {FileToExtaract} -C {TargetFolder}")

The extracted images will be located in your TargerFolderwordmedia.
You can convert them into any other image type using the Pillow library with this code:

from PIL import Image

Image.open("image.wmf").save("image.png")
Answered By: Pancake
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.