Searching for UTF-8 encoded subjects with imaplib

Question:

I have some working code to fetch mail bodies and I want to filter the subject with a non-ascii string. Other forums suggest using the .uid class to do so, but the behavior is not logic to me.

Current code:

import imaplib
import email

username = secret
password = secret

imap = imaplib.IMAP4_SSL("imap.gmail.com")

status, messages = imap.select("INBOX",readonly=True)

res, msg = imap.search(None, 'HEADER Subject "string to be encoded with UTF-8"')

Suggested code:

import imaplib
import email

username = secret
password = secret

imap = imaplib.IMAP4_SSL("imap.gmail.com")

status, messages = imap.select("INBOX",readonly=True)

imap.literal = u"string to be encoded with UTF-8".encode('utf-8')
res, msg = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')

The suggested code works fine, but the returned array (msg[0]) contains indicies of the mailbox that are out of bounds. On the contrary when I use the .search class, valid indices are returned instead as long as I search for ASCII strings (both UTF-8 and non-UTF-8 encoded strings aren’t accepted here). I don’t quite understand the behaviour and logic of .uid because of this. I’d be grateful if someone can help me on the way.

How can I filter the subject with a UTF-8 string?

Asked By: Simon W

||

Answers:

I managed to solve the scenario with the following, using the recommended way with .uid instead of .search:

imap = imaplib.IMAP4_SSL("server_to_connect_to")
imap.login(username, password)

status, messages = imap.select("INBOX",readonly=True)
imap.literal  = u'"Subject to be searched"'.encode('utf-8')
res, uid = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
messages = msg[0].decode('utf-8').split()

for uid in messages:
    res, msg = imap.uid('fetch', uid, '(RFC822)')
    #parsing logic

Answered By: Simon W

Using the search you could use a feach in the uid

```    
    EMAIL = 'your_email'
    PASSWORD = 'your_password'
    
    imap = imaplib.IMAP4_SSL("imap.gmail.com")
    imap.login(EMAIL, PASSWORD)
    imap.select("INBOX", readonly=True)
    
    imap.literal = "Subjéct tô be searchéd".encode("utf-8")
    _, list_email_id = imap.search("utf-8", "SUBJECT")
    
    for email_id in list_email_id[0].split():
        _, data = imap.fetch(email_id, "(RFC822)")
        #email = MailMessage(data)
```

You can use imap_tools to instantiate data as imaplib result returns binary and need to use decoding. the MailMessage class of imap_tools does the handling and is easy to use

Answered By: Lourivan luz
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.