How do you obtain underlying failed request data when catching requests.exceptions.RetryError?

Question:

I am using a somewhat standard pattern for putting retry behavior around requests requests in Python,

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

retry_strategy = Retry(
    total=HTTP_RETRY_LIMIT,
    status_forcelist=HTTP_RETRY_CODES,
    method_whitelist=HTTP_RETRY_METHODS,
    backoff_factor=HTTP_BACKOFF_FACTOR
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)

...

try:
    response = http.get(... some request params ...)
except requests.Exceptions.RetryError as err:
    # Do logic with err to perform error handling & logging.

Unfortunately the docs on RetryError don’t explain anything and when I intercept the exception object as above, err.response is None. While you can call str(err) to get the message string of the exception, this would require unreasonable string parsing to attempt to recover the specific response details and even if one is willing to try that, the message actually elides the necessary details. For example, one such response from a deliberate call on a site giving 400s (not that you would really retry on this but just for debugging) gives a message of "(Caused by ResponseError('too many 400 error responses'))" – which elides the actual response details, like the requested site’s own description text for the nature of the 400 error (which could be critical to determining handling, or even just to pass back for logging the error).

What I want to do is receive the response for the last unsuccessful retry attempt and use the status code and description of that specific failure to determine the handling logic. Even though I want to make it robust behind retries, I still need to know the underlying failure beyond "too many retries" when ultimately handling the error.

Is it possible to extract this information from the exception raised for retries?

Asked By: ely

||

Answers:

It’s not directly supported by the libraries:

It’s possible to achieve by subclassing Retry to attach response to MaxRetryError:

from requests.adapters import MaxRetryError, Retry


class MyRetry(Retry):

    def increment(self, *args, **kwargs):
        try:
            return super().increment(*args, **kwargs)
        except MaxRetryError as ex:
            response = kwargs.get('response')
            if response:
                response.read(cache_content=True)
                ex.response = response
            raise

Usage:

# retry_strategy = Retry(
retry_strategy = MyRetry(
# Do logic with err to perform error handling & logging.
print(err.args[0].response.status)
print(err.args[0].response.data)
Answered By: aaron

As already indicated by aaron, the actual error that you are trying to catch and the one that is being raised by the library are not the same. Also this heavily depends on the version of library used as it seems they changed things around with the Retry method as well (It is also available from from requests.adapters import Retry including the RetryError).

Working Code

For the following code tested on requests=2.27.1 and python=3.7.12 and Retry from urlib3 as you used it:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


HTTP_RETRY_LIMIT = 1
HTTP_RETRY_CODES = [403, 400, 401, 429, 500, 502, 503, 504]
HTTP_RETRY_METHODS = ['HEAD', 'GET', 'OPTIONS', 'TRACE', 'POST']
HTTP_BACKOFF_FACTOR = 1

retry_strategy = Retry(
    total=HTTP_RETRY_LIMIT,
    status_forcelist=HTTP_RETRY_CODES,
    allowed_methods=HTTP_RETRY_METHODS, # changed to allowed_methods
    backoff_factor=HTTP_BACKOFF_FACTOR
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)
try:
    response = http.get('https://www.howtogeek.com/wp-content/uploads/2018/06/')
except (requests.exceptions.RetryError, requests.exceptions.ConnectionError) as err:
    # Do logic with err to perform error handling & logging.
    print(err)
    print(err.args[0].reason)

I did get output of

requests.exceptions.RetryError: HTTPSConnectionPool(host='www.howtogeek.com', port=443): Max retries exceeded with url: /wp-content/uploads/2018/06/ (Caused by ResponseError('too many 403 error responses'))
too many 403 error responses

Alternative with sys.exc_info()

If this isn’t enough, you can check importing traceback package or using sys.exc_info() (indexing 0, 1 or 2), check more on this stackoverflow. In your case you would do something like:

import traceback, sys
try:
    response = http.get('https://www.howtogeek.com/wp-content/uploads/2018/06/')
except (requests.exceptions.RetryError, requests.exceptions.ConnectionError) as err:
    # Do logic with err to perform error handling & logging.
    print(sys.exc_info()[0]) # just the class of the exception, check the link for more info

Which returns class, which you might use to error handle, this can be also combined with catching the generic Exception

<class 'requests.exceptions.ConnectionError'>

This gives you a lot of control, as you can do info = sys.exc_info()[1] and obtain the actual object.
Thus you can access with the following:

print(info.request.url)
print(info.request.headers)
# and probably most important for you
print(info.args[0].reason) # urllib3.exceptions.ResponseError('too many 403 error responses')

And obtain the resulting info you require:

https://www.howtogeek.com/wp-content/uploads/2018/06/
{'User-Agent': 'python-requests/2.27.1', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}
too many 403 error responses

The alternative even more information with full traceback (depends on parsing):

print(traceback.format_exc()) # Returns full stack trace, might not be most useful in your case
Answered By: Warkaz

We can’t get a response in every exception because a request may not have been sent yet or a request or response may not have reached its destination.
For example these exceptions dont’ get a response.

urllib3.exceptions.ConnectTimeoutError
urllib3.exceptions.SSLError
urllib3.exceptions.NewConnectionError

There’s a parameter in urllib3.util.Retry named raise_on_status which defaults to True. If it’s made False, urllib3.exceptions.MaxRetryError won’t be raised.
And if no exceptions are raised it is certain that a response has arrived. It now becomes easy to response.raise_for_status in the else block of the try block wrapped in another try.

I’ve changed except RetryError to except Exception to catch other exceptions.

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
from requests.exceptions import RetryError

# DEFAULT_ALLOWED_METHODS = frozenset({'DELETE', 'GET', 'HEAD', 'OPTIONS', 'PUT', 'TRACE'})
#     Default methods to be used for allowed_methods
# RETRY_AFTER_STATUS_CODES = frozenset({413, 429, 503})
#     Default status codes to be used for status_forcelist

HTTP_RETRY_LIMIT = 3
HTTP_BACKOFF_FACTOR = 0.2

retry_strategy = Retry(
    total=HTTP_RETRY_LIMIT,
    backoff_factor=HTTP_BACKOFF_FACTOR,
    raise_on_status=False,
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)
try:
    response = http.get("https://httpbin.org/status/503")
except Exception as err:
    print(err)
else:
    try:
        response.raise_for_status()
    except Exception as e:
        # Do logic with err to perform error handling & logging.
        print(response.reason)
        # Or
        # print(e.response.reason)
    else:
        print(response.text)

Test;

# https://httpbin.org/user-agent
➜  python requests_retry.py
{
  "user-agent": "python-requests/2.28.1"
}

# url =  https://httpbin.org/status/503
➜  python requests_retry.py
SERVICE UNAVAILABLE
Answered By: Nizam Mohamed