How to parse a RFC 2822 date/time into a Python datetime?

Question:

I have a date of the form specified by RFC 2822 — say Fri, 15 May 2009 17:58:28 +0000, as a string. Is there a quick and/or standard way to get it as a datetime object in Python 2.5? I tried to produce a strptime format string, but the +0000 timezone specifier confuses the parser.

Asked By: millenomi

||

Answers:

There is a parsedate function in email.util.
It parses all valid RFC 2822 dates and some special cases.

Answered By: ebo
from email.utils import parsedate
print parsedate('Fri, 15 May 2009 17:58:28 +0000')

Documentation.

Answered By: nosklo

The problem is that parsedate will ignore the offset.

Do this instead:

from email.utils import parsedate_tz
print parsedate_tz('Fri, 15 May 2009 17:58:28 +0700')
Answered By: Matt Jones

I’d like to elaborate on previous answers. email.utils.parsedate and email.utils.parsedate_tz both return tuples, since the OP needs a datetime.datetime object, I’m adding these examples for completeness:

from email.utils import parsedate
from datetime import datetime
import time

t = parsedate('Sun, 14 Jul 2013 20:14:30 -0000')
d1 = datetime.fromtimestamp(time.mktime(t))

Or:

d2 = datetime.datetime(*t[:6])

Note that d1 and d2 are both naive datetime objects, there’s no timezone information stored. If you need aware datetime objects, check the tzinfo datetime() arg.

Alternatively you could use the dateutil module

Answered By: gonz

It looks like Python 3.3 going forward has a new method parsedate_to_datetime in email.utils that takes care of the intermediate steps:

email.utils.parsedate_to_datetime(date)

The inverse of format_datetime(). Performs the same function as parsedate(), but on
success returns a datetime. If the input date has a timezone of -0000,
the datetime will be a naive datetime, and if the date is conforming
to the RFCs it will represent a time in UTC but with no indication of
the actual source timezone of the message the date comes from. If the
input date has any other valid timezone offset, the datetime will be
an aware datetime with the corresponding a timezone tzinfo.

New in version 3.3.

http://python.readthedocs.org/en/latest/library/email.util.html#email.utils.parsedate_to_datetime

Answered By: erewok

email.utils.parsedate_tz(date) is the function to use. Following are some variations.

Email date/time string (RFC 5322, RFC 2822, RFC 1123) to unix timestamp in float seconds:

import email.utils
import calendar
def email_time_to_timestamp(s):
    tt = email.utils.parsedate_tz(s)
    if tt is None: return None
    return calendar.timegm(tt) - tt[9]

import time
print(time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime(email_time_to_timestamp("Wed, 04 Jan 2017 09:55:45 -0800"))))
# 2017-01-04T17:55:45Z

Make sure you do not use mktime (which interprets the time_struct in your computer’s local time, not UTC); use timegm or mktime_tz instead (but beware caveat for mktime_tz in the next paragraph).

If you are sure that you have python version 2.7.4, 3.2.4, 3.3, or newer, then you can use email.utils.mktime_tz(tt) instead of calendar.timegm(tt) - tt[9]. Before that, mktime_tz gave incorrect times when invoked during the local time zone’s fall daylight savings transition (bug 14653).

Thanks to @j-f-sebastian for caveats about mktime and mktime_tz.

Email date/time string (RFC 5322, RFC 2822, RFC 1123) to “aware” datetime on python 3.3:

On python 3.3 and above, use email.utils.parsedate_to_datetime, which returns an aware datetime with the original zone offset:

import email.utils
email.utils.parsedate_to_datetime(s)

print(email.utils.parsedate_to_datetime("Wed, 04 Jan 2017 09:55:45 -0800").isoformat())
# 2017-01-04T09:55:45-08:00

Caveat: this will throw ValueError if the time falls on a leap second e.g. email.utils.parsedate_to_datetime("Sat, 31 Dec 2016 15:59:60 -0800").

Email date/time string (RFC 5322, RFC 2822, RFC 1123) to “aware” datetime in UTC zone:

This just converts to timestamp and then to UTC datetime:

import email.utils
import calendar
import datetime
def email_time_to_utc_datetime(s):
    tt = email.utils.parsedate_tz(s)
    if tt is None: return None
    timestamp = calendar.timegm(tt) - tt[9]
    return datetime.datetime.utcfromtimestamp(timestamp)

print(email_time_to_utc_datetime("Wed, 04 Jan 2017 09:55:45 -0800").isoformat())
# 2017-01-04T17:55:45

Email date/time string (RFC 5322, RFC 2822, RFC 1123) to python “aware” datetime with original offset:

Prior to python 3.2, python did not come with tzinfo implementations, so here an example using dateutil.tz.tzoffset (pip install dateutil):

import email.utils
import datetime
import dateutil.tz
def email_time_to_datetime(s):
    tt = email.utils.parsedate_tz(s)
    if tt is None: return None
    tz = dateutil.tz.tzoffset("UTC%+02d%02d"%(tt[9]//60//60, tt[9]//60%60), tt[9])
    return datetime.datetime(*tt[:5]+(min(tt[5], 59),), tzinfo=tz)

print(email_time_to_datetime("Wed, 04 Jan 2017 09:55:45 -0800").isoformat())
# 2017-01-04T09:55:45-08:00

If you are using python 3.2, you can use the builtin tzinfo implementation datetime.timezone: tz = datetime.timezone(datetime.timedelta(seconds=tt[9])) instead of the third-party dateutil.tz.tzoffset.

Thanks to @j-f-sebastian again for note on clamping the leap second.

Answered By: yonran