Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

Question:

I receive a ‘HTTP Error 500: Internal Server Error’ response, but I still want to read the data inside the error HTML.

With Python 2.6, I normally fetch a page using:

import urllib2
url = "http://google.com"
data = urllib2.urlopen(url)
data = data.read()

When attempting to use this on the failing URL, I get the exception urllib2.HTTPError:

urllib2.HTTPError: HTTP Error 500: Internal Server Error

How can I fetch such error pages (with or without urllib2), all while they are returning Internal Server Errors?

Note that with Python 3, the corresponding exception is urllib.error.HTTPError.

Asked By: backus

||

Answers:

If you mean you want to read the body of the 500:

request = urllib2.Request(url, data, headers)
try:
        resp = urllib2.urlopen(request)
        print resp.read()
except urllib2.HTTPError, error:
        print "ERROR: ", error.read()

In your case, you don’t need to build up the request. Just do

try:
        resp = urllib2.urlopen(url)
        print resp.read()
except urllib2.HTTPError, error:
        print "ERROR: ", error.read()

so, you don’t override urllib2.HTTPError, you just handle the exception.

Answered By: sberry

The HTTPError is a file-like object. You can catch it and then read its contents.

try:
    resp = urllib2.urlopen(url)
    contents = resp.read()
except urllib2.HTTPError, error:
    contents = error.read()
Answered By: Joe Holloway
alist=['http://someurl.com']

def testUrl():
    errList=[]
    for URL in alist:
        try:
            urllib2.urlopen(URL)
        except urllib2.URLError, err:
            (err.reason != 200)
            errList.append(URL+" "+str(err.reason))
            return URL+" "+str(err.reason)
    return "".join(errList)

testUrl()
Answered By: Gal Levy
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.