Searching for UTF-8 encoded subjects with imaplib
Question:
I have some working code to fetch mail bodies and I want to filter the subject with a non-ascii string. Other forums suggest using the .uid
class to do so, but the behavior is not logic to me.
Current code:
import imaplib
import email
username = secret
password = secret
imap = imaplib.IMAP4_SSL("imap.gmail.com")
status, messages = imap.select("INBOX",readonly=True)
res, msg = imap.search(None, 'HEADER Subject "string to be encoded with UTF-8"')
Suggested code:
import imaplib
import email
username = secret
password = secret
imap = imaplib.IMAP4_SSL("imap.gmail.com")
status, messages = imap.select("INBOX",readonly=True)
imap.literal = u"string to be encoded with UTF-8".encode('utf-8')
res, msg = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
The suggested code works fine, but the returned array (msg[0]
) contains indicies of the mailbox that are out of bounds. On the contrary when I use the .search
class, valid indices are returned instead as long as I search for ASCII strings (both UTF-8 and non-UTF-8 encoded strings aren’t accepted here). I don’t quite understand the behaviour and logic of .uid
because of this. I’d be grateful if someone can help me on the way.
How can I filter the subject with a UTF-8 string?
Answers:
I managed to solve the scenario with the following, using the recommended way with .uid
instead of .search
:
imap = imaplib.IMAP4_SSL("server_to_connect_to")
imap.login(username, password)
status, messages = imap.select("INBOX",readonly=True)
imap.literal = u'"Subject to be searched"'.encode('utf-8')
res, uid = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
messages = msg[0].decode('utf-8').split()
for uid in messages:
res, msg = imap.uid('fetch', uid, '(RFC822)')
#parsing logic
Using the search you could use a feach in the uid
```
EMAIL = 'your_email'
PASSWORD = 'your_password'
imap = imaplib.IMAP4_SSL("imap.gmail.com")
imap.login(EMAIL, PASSWORD)
imap.select("INBOX", readonly=True)
imap.literal = "Subjéct tô be searchéd".encode("utf-8")
_, list_email_id = imap.search("utf-8", "SUBJECT")
for email_id in list_email_id[0].split():
_, data = imap.fetch(email_id, "(RFC822)")
#email = MailMessage(data)
```
You can use imap_tools to instantiate data as imaplib result returns binary and need to use decoding. the MailMessage class of imap_tools does the handling and is easy to use
I have some working code to fetch mail bodies and I want to filter the subject with a non-ascii string. Other forums suggest using the .uid
class to do so, but the behavior is not logic to me.
Current code:
import imaplib
import email
username = secret
password = secret
imap = imaplib.IMAP4_SSL("imap.gmail.com")
status, messages = imap.select("INBOX",readonly=True)
res, msg = imap.search(None, 'HEADER Subject "string to be encoded with UTF-8"')
Suggested code:
import imaplib
import email
username = secret
password = secret
imap = imaplib.IMAP4_SSL("imap.gmail.com")
status, messages = imap.select("INBOX",readonly=True)
imap.literal = u"string to be encoded with UTF-8".encode('utf-8')
res, msg = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
The suggested code works fine, but the returned array (msg[0]
) contains indicies of the mailbox that are out of bounds. On the contrary when I use the .search
class, valid indices are returned instead as long as I search for ASCII strings (both UTF-8 and non-UTF-8 encoded strings aren’t accepted here). I don’t quite understand the behaviour and logic of .uid
because of this. I’d be grateful if someone can help me on the way.
How can I filter the subject with a UTF-8 string?
I managed to solve the scenario with the following, using the recommended way with .uid
instead of .search
:
imap = imaplib.IMAP4_SSL("server_to_connect_to")
imap.login(username, password)
status, messages = imap.select("INBOX",readonly=True)
imap.literal = u'"Subject to be searched"'.encode('utf-8')
res, uid = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
messages = msg[0].decode('utf-8').split()
for uid in messages:
res, msg = imap.uid('fetch', uid, '(RFC822)')
#parsing logic
Using the search you could use a feach in the uid
```
EMAIL = 'your_email'
PASSWORD = 'your_password'
imap = imaplib.IMAP4_SSL("imap.gmail.com")
imap.login(EMAIL, PASSWORD)
imap.select("INBOX", readonly=True)
imap.literal = "Subjéct tô be searchéd".encode("utf-8")
_, list_email_id = imap.search("utf-8", "SUBJECT")
for email_id in list_email_id[0].split():
_, data = imap.fetch(email_id, "(RFC822)")
#email = MailMessage(data)
```
You can use imap_tools to instantiate data as imaplib result returns binary and need to use decoding. the MailMessage class of imap_tools does the handling and is easy to use