Stop urllib.request from raising exceptions on HTTP errors

Question:

Python’s urllib.request.urlopen() will raise an exception if the HTTP status code of the request is not OK (e.g., 404).

This is because the default opener uses the HTTPDefaultErrorHandler class:

A class which defines a default handler for HTTP error responses; all responses are turned into HTTPError exceptions.

Even if you build your own opener, it (un)helpfully includes the HTTPDefaultErrorHandler for you implicitly.

If, however, you don’t want Python to raise an exception if you get a non-OK response, it’s unclear how to disable this behavior.

Asked By: rgov

||

Answers:

If you build your own opener with build_opener(), the documentation notes, emphasis added,

Instances of the following classes will be in front of the handlers, unless the handlers contain them, instances of them or subclasses of them: … HTTPDefaultErrorHandler

Therefore, we need to make our own subclass of HTTPDefaultErrorHandler that does not raise an exception and simply passes the response through the pipeline unmodified. Then build_opener() will use our error handler instead of the default one.

import urllib.request

class NonRaisingHTTPErrorProcessor(urllib.request.HTTPErrorProcessor):
    http_response = https_response = lambda self, request, response: response

opener = urllib.request.build_opener(NonRaisingHTTPErrorProcessor)
response = opener.open('http://example.com/doesnt-exist')
print(response.status)  # prints 404

This answer (including the code sample) was not written by ChatGPT, but it did point out the solution.

Answered By: rgov
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.