Dump Django stack trace on Gunicorn timeout
Question:
I’m trying to debug rare hangs of a Django application. So far I failed to isolate the problem, it happens about once a day in production and Gunicorn restarts the process with a message:
[CRITICAL] WORKER TIMEOUT
Is there a way to configure Django or Gunicorn to dump a stack trace of a process that is restarted?
Answers:
Try setting your Gunicorn log to be more verbose, maybe set it to INFO
or DEBUG
that might shed more light in the log.
You could also take a look at Dog Slow, which will log slow requests. https://pypi.python.org/pypi/dogslow.
And for general logging win, try using Sentry: https://www.getsentry.com/welcome/.
Random question, any crons on the server running at that time, backups, that sorta thing?
timeout is not meant as a request timeout. It’s meant as a liveness check for workers. For sync workers, this functions as a request timeout because the worker cannot do anything other than process the request. The asynchronous workers heartbeat even while they are handling long running requests, so unless the worker blocks/freezes it won’t be killed.
Gunicorn has a function called worker_abort (see gunicorn docs below).
def worker_abort(worker):
worker.log.info("worker received abort signal")
import threading, sys, traceback
id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
code = []
for threadId, stack in sys._current_frames().items():
code.append("n# Thread: %s(%d)" % (id2name.get(threadId,""), threadId))
stack = traceback.extract_stack(stack)
for filename, lineno, name, line in stack:
code.append('File: "%s", line %d, in %s' % (filename, lineno, name))
if line:
code.append(" %s" % (line.strip()))
worker.log.debug("n".join(code))
Called when a worker received the SIGABRT signal. This call generally happens on timeout. The callable needs to accept one instance variable for the initialised Worker.
Sources:
http://docs.gunicorn.org/en/stable/settings.html, https://github.com/benoitc/gunicorn/issues/1493, https://github.com/benoitc/gunicorn/blob/master/examples/example_config.py
This will print a stack trace for the worker at the time that it gets killed. You’ll need to create a gunicorn config file, and paste the following function into it.
import traceback
import io
def worker_abort(worker):
debug_info = io.StringIO()
debug_info.write("Traceback at time of timeout:n")
traceback.print_stack(file=debug_info)
worker.log.critical(debug_info.getvalue())
I’m trying to debug rare hangs of a Django application. So far I failed to isolate the problem, it happens about once a day in production and Gunicorn restarts the process with a message:
[CRITICAL] WORKER TIMEOUT
Is there a way to configure Django or Gunicorn to dump a stack trace of a process that is restarted?
Try setting your Gunicorn log to be more verbose, maybe set it to INFO
or DEBUG
that might shed more light in the log.
You could also take a look at Dog Slow, which will log slow requests. https://pypi.python.org/pypi/dogslow.
And for general logging win, try using Sentry: https://www.getsentry.com/welcome/.
Random question, any crons on the server running at that time, backups, that sorta thing?
timeout is not meant as a request timeout. It’s meant as a liveness check for workers. For sync workers, this functions as a request timeout because the worker cannot do anything other than process the request. The asynchronous workers heartbeat even while they are handling long running requests, so unless the worker blocks/freezes it won’t be killed.
Gunicorn has a function called worker_abort (see gunicorn docs below).
def worker_abort(worker):
worker.log.info("worker received abort signal")
import threading, sys, traceback
id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
code = []
for threadId, stack in sys._current_frames().items():
code.append("n# Thread: %s(%d)" % (id2name.get(threadId,""), threadId))
stack = traceback.extract_stack(stack)
for filename, lineno, name, line in stack:
code.append('File: "%s", line %d, in %s' % (filename, lineno, name))
if line:
code.append(" %s" % (line.strip()))
worker.log.debug("n".join(code))
Called when a worker received the SIGABRT signal. This call generally happens on timeout. The callable needs to accept one instance variable for the initialised Worker.
Sources:
http://docs.gunicorn.org/en/stable/settings.html, https://github.com/benoitc/gunicorn/issues/1493, https://github.com/benoitc/gunicorn/blob/master/examples/example_config.py
This will print a stack trace for the worker at the time that it gets killed. You’ll need to create a gunicorn config file, and paste the following function into it.
import traceback
import io
def worker_abort(worker):
debug_info = io.StringIO()
debug_info.write("Traceback at time of timeout:n")
traceback.print_stack(file=debug_info)
worker.log.critical(debug_info.getvalue())