Catch exception gets UnboundLocalError

Question:

I wrote a crawler to fetch information out of an Q&A website. Since not all the fields are presented in a page all the time, I used multiple try-excepts to handle the situation.

def answerContentExtractor( loginSession, questionLinkQueue , answerContentList) :
    while True:
        URL = questionLinkQueue.get()
        try:
            response   = loginSession.get(URL,timeout = MAX_WAIT_TIME)
            raw_data   = response.text

            #These fields must exist, or something went wrong...
            questionId = re.findall(REGEX,raw_data)[0]
            answerId   = re.findall(REGEX,raw_data)[0]
            title      = re.findall(REGEX,raw_data)[0]

        except requests.exceptions.Timeout ,IndexError:
            print >> sys.stderr, URL + " extraction error..."
            questionLinkQueue.task_done()
            continue

        try:
            questionInfo = re.findall(REGEX,raw_data)[0]
        except IndexError:
            questionInfo = ""

        try:
            answerContent = re.findall(REGEX,raw_data)[0]
        except IndexError:
            answerContent = ""

        result = {
                  'questionId'   : questionId,
                  'answerId'     : answerId,
                  'title'        : title,
                  'questionInfo' : questionInfo,
                  'answerContent': answerContent
                  }

        answerContentList.append(result)
        questionLinkQueue.task_done()

And this code, sometimes, may or may not, gives the following exception during runtime:

UnboundLocalError: local variable 'IndexError' referenced before assignment

The line number indicates the error occurs at the second except IndexError:

Thanks everyone for your suggestions, Would love to give the marks that you deserve, too bad I can only mark one as the correct answer…

Asked By: Paul Liang

||

Answers:

When you say

except requests.exceptions.Timeout ,IndexError:

Python will except requests.exceptions.Timeout error and the error object will be IndexError. It should have been something like this

except (requests.exceptions.Timeout ,IndexError) as e:
Answered By: thefourtheye

In Python 2.x, the line

except requests.exceptions.Timeout, IndexError:

is equivalent to

except requests.exceptions.Timeout as IndexError:

Thus, the exception caught by requests.exceptions.Timeout is assigned to IndexError. A simpler example:

try:
    true
except NameError, IndexError:
    print IndexError
    #name 'true' is not defined

To catch multiple exceptions, put the names in parentheses:

except (requests.exceptions.Timeout, IndexError):

Later, an UnboundLocalError can occur because the assignment to IndexError makes it a local variable (shadowing the builtin name):

>>> 'IndexError' in answerContentExtractor.func_code.co_varnames
True

So, if requests.exceptions.Timeout was not raised, IndexError will not have been (incorrectly) defined when the code attempts except IndexError:.

Again, a simpler example:

def func():
    try:
        func # defined, so the except block doesn't run,
    except NameError, IndexError: # so the local `IndexError` isn't assigned
        pass
    try:
        [][1]
    except IndexError:
        pass
func()
#UnboundLocalError: local variable 'IndexError' referenced before assignment

In 3.x, the problem will occur (after fixing the except syntax, which makes the error more obvious) even if the first exception is caught. This is because the local name IndexError will then be explicitly deld after the first try/except block.

Answered By: Ashwini Chaudhary
except requests.exceptions.Timeout ,IndexError:

means same as except requests.exceptions.Timeout as IndexError

You should use

except (requests.exceptions.Timeout, IndexError):

instead

Answered By: Kimvais
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.