How to extract text from the plain text part of multipart/alternative?

Question:

# main.py
import email
from email.iterators import _structure
import sys
msg = email.message_from_string(sys.stdin.read())
_structure(msg)
./main.py <<EOF
From:  Nathaniel Borenstein <[email protected]>
To: Ned Freed <[email protected]>
Subject: Formatted text mail
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=boundary42


--boundary42
Content-Type: text/plain; charset=us-ascii

...plain text version of message goes here....

--boundary42
Content-Type: text/richtext

.... richtext version of same message goes here ...
--boundary42
Content-Type: text/x-whatever

.... fanciest formatted version of same  message  goes  here
...
--boundary42--
EOF

The output

multipart/alternative
    text/plain
    text/richtext
    text/x-whatever

I can call the email module to get the structure of a multipart email message like the above. How can I extract the text/plain part of the email message? (In this particular example, it should be "…plain text version of message goes here….".)

Asked By: user1424739

||

Answers:

You call msg.get_payload() to get the payload of your message, and then you iterate over the parts until you find the text/plain part:

# main.py
import email
import sys

msg = email.message_from_string(sys.stdin.read())

for part in msg.get_payload():
    if part.get_content_type() == 'text/plain':
        print(part.get_payload())

Given your sample input, the above code produces as output:

...plain text version of message goes here....

You could instead use email.iterators.typed_subpart_iterator, like this:

# main.py
import email
import email.iterators
import sys

msg = email.message_from_string(sys.stdin.read())

for part in email.iterators.typed_subpart_iterator(msg, maintype='text', subtype="plain"):
    print(part.get_payload())

This produces the same output as the earlier example.


docs.python.org/3/library/email.parser.html says get_body() can work

The get_body method is only available on email.message.EmailMessage, but by default email.message_from_string returns a legacy email.message.Message object (see the docs here).

In order to get an email.message.EmailMessage object, you need to pass in a policy parameter:

import email
import email.policy

msg = email.message_from_string(sys.stdin.read(), policy=email.policy.default)

print(msg.get_body().get_payload())

This will also produce the same output as the first example.

Answered By: larsks
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.