What errors/exceptions do I need to handle with urllib2.Request / urlopen?
Question:
I have the following code to do a postback to a remote URL:
request = urllib2.Request('http://www.example.com', postBackData, { 'User-Agent' : 'My User Agent' })
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
checksLogger.error('HTTPError = ' + str(e.code))
except urllib2.URLError, e:
checksLogger.error('URLError = ' + str(e.reason))
except httplib.HTTPException, e:
checksLogger.error('HTTPException')
The postBackData is created using a dictionary encoded using urllib.urlencode. checksLogger is a logger using logging.
I have had a problem where this code runs when the remote server is down and the code exits (this is on customer servers so I don’t know what the exit stack dump / error is at this time). I’m assuming this is because there is an exception and/or error that is not being handled. So are there any other exceptions that might be triggered that I’m not handling above?
Answers:
You can catch all exceptions and log what’s get caught:
import sys
import traceback
def formatExceptionInfo(maxTBlevel=5):
cla, exc, trbk = sys.exc_info()
excName = cla.__name__
try:
excArgs = exc.__dict__["args"]
except KeyError:
excArgs = "<no args>"
excTb = traceback.format_tb(trbk, maxTBlevel)
return (excName, excArgs, excTb)
try:
x = x + 1
except:
print formatExceptionInfo()
(Code from http://www.linuxjournal.com/article/5821)
Also read documentation on sys.exc_info.
From the docs page urlopen
entry, it looks like you just need to catch URLError. If you really want to hedge your bets against problems within the urllib code, you can also catch Exception
as a fall-back. Do not just except:
, since that will catch SystemExit
and KeyboardInterrupt
also.
Edit: What I mean to say is, you’re catching the errors it’s supposed to throw. If it’s throwing something else, it’s probably due to urllib code not catching something that it should have caught and wrapped in a URLError
. Even the stdlib tends to miss simple things like AttributeError
. Catching Exception
as a fall-back (and logging what it caught) will help you figure out what’s happening, without trapping SystemExit
and KeyboardInterrupt
.
$ grep "raise" /usr/lib64/python/urllib2.py
IOError); for HTTP errors, raises an HTTPError, which can also be
raise AttributeError, attr
raise ValueError, "unknown url type: %s" % self.__original
# XXX raise an exception if no one else should try to handle
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
perform the redirect. Otherwise, raise HTTPError if no-one
raise HTTPError(req.get_full_url(), code, msg, headers, fp)
raise HTTPError(req.get_full_url(), code,
raise HTTPError(req.get_full_url(), 401, "digest auth failed",
raise ValueError("AbstractDigestAuthHandler doesn't know "
raise URLError('no host given')
raise URLError('no host given')
raise URLError(err)
raise URLError('unknown url type: %s' % type)
raise URLError('file not on local host')
raise IOError, ('ftp error', 'no host given')
raise URLError(msg)
raise IOError, ('ftp error', msg), sys.exc_info()[2]
raise GopherError('no host given')
There is also the possibility of exceptions in urllib2 dependencies, or of exceptions caused by genuine bugs.
You are best off logging all uncaught exceptions in a file via a custom sys.excepthook. The key rule of thumb here is to never catch exceptions you aren’t planning to correct, and logging is not a correction. So don’t catch them just to log them.
Add generic exception handler:
request = urllib2.Request('http://www.example.com', postBackData, { 'User-Agent' : 'My User Agent' })
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
checksLogger.error('HTTPError = ' + str(e.code))
except urllib2.URLError, e:
checksLogger.error('URLError = ' + str(e.reason))
except httplib.HTTPException, e:
checksLogger.error('HTTPException')
except Exception:
import traceback
checksLogger.error('generic exception: ' + traceback.format_exc())
I catch:
httplib.HTTPException
urllib2.HTTPError
urllib2.URLError
I believe this covers everything including socket errors.
I have the following code to do a postback to a remote URL:
request = urllib2.Request('http://www.example.com', postBackData, { 'User-Agent' : 'My User Agent' })
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
checksLogger.error('HTTPError = ' + str(e.code))
except urllib2.URLError, e:
checksLogger.error('URLError = ' + str(e.reason))
except httplib.HTTPException, e:
checksLogger.error('HTTPException')
The postBackData is created using a dictionary encoded using urllib.urlencode. checksLogger is a logger using logging.
I have had a problem where this code runs when the remote server is down and the code exits (this is on customer servers so I don’t know what the exit stack dump / error is at this time). I’m assuming this is because there is an exception and/or error that is not being handled. So are there any other exceptions that might be triggered that I’m not handling above?
You can catch all exceptions and log what’s get caught:
import sys
import traceback
def formatExceptionInfo(maxTBlevel=5):
cla, exc, trbk = sys.exc_info()
excName = cla.__name__
try:
excArgs = exc.__dict__["args"]
except KeyError:
excArgs = "<no args>"
excTb = traceback.format_tb(trbk, maxTBlevel)
return (excName, excArgs, excTb)
try:
x = x + 1
except:
print formatExceptionInfo()
(Code from http://www.linuxjournal.com/article/5821)
Also read documentation on sys.exc_info.
From the docs page urlopen
entry, it looks like you just need to catch URLError. If you really want to hedge your bets against problems within the urllib code, you can also catch Exception
as a fall-back. Do not just except:
, since that will catch SystemExit
and KeyboardInterrupt
also.
Edit: What I mean to say is, you’re catching the errors it’s supposed to throw. If it’s throwing something else, it’s probably due to urllib code not catching something that it should have caught and wrapped in a URLError
. Even the stdlib tends to miss simple things like AttributeError
. Catching Exception
as a fall-back (and logging what it caught) will help you figure out what’s happening, without trapping SystemExit
and KeyboardInterrupt
.
$ grep "raise" /usr/lib64/python/urllib2.py
IOError); for HTTP errors, raises an HTTPError, which can also be
raise AttributeError, attr
raise ValueError, "unknown url type: %s" % self.__original
# XXX raise an exception if no one else should try to handle
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
perform the redirect. Otherwise, raise HTTPError if no-one
raise HTTPError(req.get_full_url(), code, msg, headers, fp)
raise HTTPError(req.get_full_url(), code,
raise HTTPError(req.get_full_url(), 401, "digest auth failed",
raise ValueError("AbstractDigestAuthHandler doesn't know "
raise URLError('no host given')
raise URLError('no host given')
raise URLError(err)
raise URLError('unknown url type: %s' % type)
raise URLError('file not on local host')
raise IOError, ('ftp error', 'no host given')
raise URLError(msg)
raise IOError, ('ftp error', msg), sys.exc_info()[2]
raise GopherError('no host given')
There is also the possibility of exceptions in urllib2 dependencies, or of exceptions caused by genuine bugs.
You are best off logging all uncaught exceptions in a file via a custom sys.excepthook. The key rule of thumb here is to never catch exceptions you aren’t planning to correct, and logging is not a correction. So don’t catch them just to log them.
Add generic exception handler:
request = urllib2.Request('http://www.example.com', postBackData, { 'User-Agent' : 'My User Agent' })
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
checksLogger.error('HTTPError = ' + str(e.code))
except urllib2.URLError, e:
checksLogger.error('URLError = ' + str(e.reason))
except httplib.HTTPException, e:
checksLogger.error('HTTPException')
except Exception:
import traceback
checksLogger.error('generic exception: ' + traceback.format_exc())
I catch:
httplib.HTTPException
urllib2.HTTPError
urllib2.URLError
I believe this covers everything including socket errors.