How do I parse an HTTP date-string in Python?

Question:

Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.

In other words, I want to convert a string like “Wed, 23 Sep 2009 22:15:29 GMT” to a python time-structure.

Asked By: Troels Arvin

||

Answers:

>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)
Answered By: SilentGhost
>>> import email.utils, datetime
>>> email.utils.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)

If you want a datetime.datetime object, you can do:

# Python <3.3
def my_parsedate(text):
    return datetime.datetime(*eut.parsedate(text)[:6])

# Python ≥3.3
def my_parsedate(text):
    return email.utils.parsedate_to_datetime(text)

email.utils.parsedate

Attempts to parse a date according to the rules in RFC 2822. however, some mailers don’t follow that format as specified, so parsedate() tries to guess correctly in such cases. date is a string containing an RFC 2822 date, such as "Mon, 20 Nov 1995 19:12:08 -0500". If it succeeds in parsing the date, parsedate() returns a 9-tuple that can be passed directly to time.mktime(); otherwise None will be returned. Note that indexes 6, 7, and 8 of the result tuple are not usable.

email.utils.parsedate_to_datetime

The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime; otherwise ValueError is raised if date contains an invalid value such as an hour greater than 23 or a timezone offset not between -24 and 24 hours. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo.

Answered By: tzot
httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
  • if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
  • if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
  • it can probably parse many date formats
  • httplib is in the core

NOTE:

  • had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
  • parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.

you can do this, if you only have that piece of string and you want to parse it:

>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>> 

but let me exemplify through mime messages:

import mimetools
import StringIO
message = mimetools.Message(
    StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMTrnrn'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)

or via http messages (responses)

>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMTrnrn'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)

right?

>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)

there, now we now more about date formats, mime messages, mime tools and their pythonic implementation 😉

whatever the case, looks better than using email.utils for parsing http headers.

Answered By: user237419

Since Python 3.3 there’s email.utils.parsedate_to_datetime which can parse RFC 5322 timestamps (aka IMF-fixdate, Internet Message Format fixed length format, a subset of HTTP-date of RFC 7231).

>>> from email.utils import parsedate_to_datetime
... 
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)

There’s also undocumented http.cookiejar.http2time which can achieve the same as follows:

>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
... 
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)

It was introduced in Python 2.4 as cookielib.http2time for dealing with Cookie Expires directive which is expressed in the same format.

Answered By: saaj
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.