Catch exception gets UnboundLocalError
Question:
I wrote a crawler to fetch information out of an Q&A website. Since not all the fields are presented in a page all the time, I used multiple try-excepts to handle the situation.
def answerContentExtractor( loginSession, questionLinkQueue , answerContentList) :
while True:
URL = questionLinkQueue.get()
try:
response = loginSession.get(URL,timeout = MAX_WAIT_TIME)
raw_data = response.text
#These fields must exist, or something went wrong...
questionId = re.findall(REGEX,raw_data)[0]
answerId = re.findall(REGEX,raw_data)[0]
title = re.findall(REGEX,raw_data)[0]
except requests.exceptions.Timeout ,IndexError:
print >> sys.stderr, URL + " extraction error..."
questionLinkQueue.task_done()
continue
try:
questionInfo = re.findall(REGEX,raw_data)[0]
except IndexError:
questionInfo = ""
try:
answerContent = re.findall(REGEX,raw_data)[0]
except IndexError:
answerContent = ""
result = {
'questionId' : questionId,
'answerId' : answerId,
'title' : title,
'questionInfo' : questionInfo,
'answerContent': answerContent
}
answerContentList.append(result)
questionLinkQueue.task_done()
And this code, sometimes, may or may not, gives the following exception during runtime:
UnboundLocalError: local variable 'IndexError' referenced before assignment
The line number indicates the error occurs at the second except IndexError:
Thanks everyone for your suggestions, Would love to give the marks that you deserve, too bad I can only mark one as the correct answer…
Answers:
When you say
except requests.exceptions.Timeout ,IndexError:
Python will except requests.exceptions.Timeout
error and the error object will be IndexError
. It should have been something like this
except (requests.exceptions.Timeout ,IndexError) as e:
In Python 2.x, the line
except requests.exceptions.Timeout, IndexError:
except requests.exceptions.Timeout as IndexError:
Thus, the exception caught by requests.exceptions.Timeout
is assigned to IndexError
. A simpler example:
try:
true
except NameError, IndexError:
print IndexError
#name 'true' is not defined
To catch multiple exceptions, put the names in parentheses:
except (requests.exceptions.Timeout, IndexError):
Later, an UnboundLocalError
can occur because the assignment to IndexError
makes it a local variable (shadowing the builtin name):
>>> 'IndexError' in answerContentExtractor.func_code.co_varnames
True
So, if requests.exceptions.Timeout
was not raised, IndexError
will not have been (incorrectly) defined when the code attempts except IndexError:
.
Again, a simpler example:
def func():
try:
func # defined, so the except block doesn't run,
except NameError, IndexError: # so the local `IndexError` isn't assigned
pass
try:
[][1]
except IndexError:
pass
func()
#UnboundLocalError: local variable 'IndexError' referenced before assignment
In 3.x, the problem will occur (after fixing the except
syntax, which makes the error more obvious) even if the first exception is caught. This is because the local name IndexError
will then be explicitly del
d after the first try
/except
block.
except requests.exceptions.Timeout ,IndexError:
means same as except requests.exceptions.Timeout as IndexError
You should use
except (requests.exceptions.Timeout, IndexError):
instead
I wrote a crawler to fetch information out of an Q&A website. Since not all the fields are presented in a page all the time, I used multiple try-excepts to handle the situation.
def answerContentExtractor( loginSession, questionLinkQueue , answerContentList) :
while True:
URL = questionLinkQueue.get()
try:
response = loginSession.get(URL,timeout = MAX_WAIT_TIME)
raw_data = response.text
#These fields must exist, or something went wrong...
questionId = re.findall(REGEX,raw_data)[0]
answerId = re.findall(REGEX,raw_data)[0]
title = re.findall(REGEX,raw_data)[0]
except requests.exceptions.Timeout ,IndexError:
print >> sys.stderr, URL + " extraction error..."
questionLinkQueue.task_done()
continue
try:
questionInfo = re.findall(REGEX,raw_data)[0]
except IndexError:
questionInfo = ""
try:
answerContent = re.findall(REGEX,raw_data)[0]
except IndexError:
answerContent = ""
result = {
'questionId' : questionId,
'answerId' : answerId,
'title' : title,
'questionInfo' : questionInfo,
'answerContent': answerContent
}
answerContentList.append(result)
questionLinkQueue.task_done()
And this code, sometimes, may or may not, gives the following exception during runtime:
UnboundLocalError: local variable 'IndexError' referenced before assignment
The line number indicates the error occurs at the second except IndexError:
Thanks everyone for your suggestions, Would love to give the marks that you deserve, too bad I can only mark one as the correct answer…
When you say
except requests.exceptions.Timeout ,IndexError:
Python will except requests.exceptions.Timeout
error and the error object will be IndexError
. It should have been something like this
except (requests.exceptions.Timeout ,IndexError) as e:
In Python 2.x, the line
except requests.exceptions.Timeout, IndexError:
except requests.exceptions.Timeout as IndexError:
Thus, the exception caught by requests.exceptions.Timeout
is assigned to IndexError
. A simpler example:
try:
true
except NameError, IndexError:
print IndexError
#name 'true' is not defined
To catch multiple exceptions, put the names in parentheses:
except (requests.exceptions.Timeout, IndexError):
Later, an UnboundLocalError
can occur because the assignment to IndexError
makes it a local variable (shadowing the builtin name):
>>> 'IndexError' in answerContentExtractor.func_code.co_varnames
True
So, if requests.exceptions.Timeout
was not raised, IndexError
will not have been (incorrectly) defined when the code attempts except IndexError:
.
Again, a simpler example:
def func():
try:
func # defined, so the except block doesn't run,
except NameError, IndexError: # so the local `IndexError` isn't assigned
pass
try:
[][1]
except IndexError:
pass
func()
#UnboundLocalError: local variable 'IndexError' referenced before assignment
In 3.x, the problem will occur (after fixing the except
syntax, which makes the error more obvious) even if the first exception is caught. This is because the local name IndexError
will then be explicitly del
d after the first try
/except
block.
except requests.exceptions.Timeout ,IndexError:
means same as except requests.exceptions.Timeout as IndexError
You should use
except (requests.exceptions.Timeout, IndexError):
instead