How do I efficiently check if data was returned in my GET request?

Question:

I am webscraping and need to parse through a few thousand GET requests at a time. Sometimes these requests fail and I get 429 and/or 403 errors so I need to check if there is data before parsing the response. I wrote this function:

def check_response(response):
    if not response or not response.content:
        return False
    else:
        soup = BeautifulSoup(response.content, "html.parser")
        if not soup or not soup.find_all(attrs={"class": "stuff"}):
            return False
    
    return True

This works, but it can take quite a while to loop through a few thousand responses. Is there a better way?

Asked By: Dunc

||

Answers:

You can use the response.status_code attribute to check the status code of the response. You can find a full list of HTTP error codes on MDN, but if it is >= 400, then it’s definitely an error. Try using this code:

def check_response(response):
    if not response or not response.content or response.status_code >= 400:
        return False
    else:
        soup = BeautifulSoup(response.content, "html.parser")
        if not soup or not soup.find_all(attrs={"class": "stuff"}):
            return False
    return True

Note that you need to indent your return True one level inwards, or else it will never be called because of the else-statement.

Answered By: Michael M.

Notwithstanding the comments by @Michael M I propose the following:

def check_response(response): # the value passed is a returned value from requests.get and therefore will never be falsy
    try:
        response.raise_for_status()
        soup = BeautifulSoup(response.txt, 'lxml')
        if soup.find_all(attrs={"class": "stuff"}):
            return True
    except Exception:
        pass
    return False
Answered By: Pingu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.