How do I parse an HTTP date-string in Python?
Question:
Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.
In other words, I want to convert a string like “Wed, 23 Sep 2009 22:15:29 GMT” to a python time-structure.
Answers:
>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)
>>> import email.utils, datetime
>>> email.utils.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)
If you want a datetime.datetime
object, you can do:
# Python <3.3
def my_parsedate(text):
return datetime.datetime(*eut.parsedate(text)[:6])
# Python ≥3.3
def my_parsedate(text):
return email.utils.parsedate_to_datetime(text)
Attempts to parse a date according to the rules in RFC 2822. however, some mailers don’t follow that format as specified, so parsedate()
tries to guess correctly in such cases. date is a string containing an RFC 2822 date, such as "Mon, 20 Nov 1995 19:12:08 -0500". If it succeeds in parsing the date, parsedate()
returns a 9-tuple that can be passed directly to time.mktime()
; otherwise None
will be returned. Note that indexes 6, 7, and 8 of the result tuple are not usable.
email.utils.parsedate_to_datetime
The inverse of format_datetime(). Performs the same function as parsedate()
, but on success returns a datetime; otherwise ValueError
is raised if date contains an invalid value such as an hour greater than 23 or a timezone offset not between -24 and 24 hours. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo.
httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
- if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
- if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
- it can probably parse many date formats
- httplib is in the core
NOTE:
- had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
- parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.
you can do this, if you only have that piece of string and you want to parse it:
>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>>
but let me exemplify through mime messages:
import mimetools
import StringIO
message = mimetools.Message(
StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMTrnrn'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
or via http messages (responses)
>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMTrnrn'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
right?
>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)
there, now we now more about date formats, mime messages, mime tools and their pythonic implementation 😉
whatever the case, looks better than using email.utils for parsing http headers.
Since Python 3.3 there’s email.utils.parsedate_to_datetime
which can parse RFC 5322 timestamps (aka IMF-fixdate
, Internet Message Format fixed length format, a subset of HTTP-date
of RFC 7231).
>>> from email.utils import parsedate_to_datetime
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
There’s also undocumented http.cookiejar.http2time
which can achieve the same as follows:
>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
It was introduced in Python 2.4 as cookielib.http2time
for dealing with Cookie Expires
directive which is expressed in the same format.
Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.
In other words, I want to convert a string like “Wed, 23 Sep 2009 22:15:29 GMT” to a python time-structure.
>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)
>>> import email.utils, datetime
>>> email.utils.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)
If you want a datetime.datetime
object, you can do:
# Python <3.3
def my_parsedate(text):
return datetime.datetime(*eut.parsedate(text)[:6])
# Python ≥3.3
def my_parsedate(text):
return email.utils.parsedate_to_datetime(text)
Attempts to parse a date according to the rules in RFC 2822. however, some mailers don’t follow that format as specified, so
parsedate()
tries to guess correctly in such cases. date is a string containing an RFC 2822 date, such as "Mon, 20 Nov 1995 19:12:08 -0500". If it succeeds in parsing the date,parsedate()
returns a 9-tuple that can be passed directly totime.mktime()
; otherwiseNone
will be returned. Note that indexes 6, 7, and 8 of the result tuple are not usable.
email.utils.parsedate_to_datetime
The inverse of format_datetime(). Performs the same function as
parsedate()
, but on success returns a datetime; otherwiseValueError
is raised if date contains an invalid value such as an hour greater than 23 or a timezone offset not between -24 and 24 hours. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo.
httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
- if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
- if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
- it can probably parse many date formats
- httplib is in the core
NOTE:
- had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
- parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.
you can do this, if you only have that piece of string and you want to parse it:
>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>>
but let me exemplify through mime messages:
import mimetools
import StringIO
message = mimetools.Message(
StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMTrnrn'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
or via http messages (responses)
>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMTrnrn'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
right?
>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)
there, now we now more about date formats, mime messages, mime tools and their pythonic implementation 😉
whatever the case, looks better than using email.utils for parsing http headers.
Since Python 3.3 there’s email.utils.parsedate_to_datetime
which can parse RFC 5322 timestamps (aka IMF-fixdate
, Internet Message Format fixed length format, a subset of HTTP-date
of RFC 7231).
>>> from email.utils import parsedate_to_datetime
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
There’s also undocumented http.cookiejar.http2time
which can achieve the same as follows:
>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
...
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)
It was introduced in Python 2.4 as cookielib.http2time
for dealing with Cookie Expires
directive which is expressed in the same format.