How do I dump an entire Python process for later debugging inspection?

Question:

I have a Python application in a strange state. I don’t want to do live debugging of the process. Can I dump it to a file and examine its state later? I know I’ve restored corefiles of C programs in gdb later, but I don’t know how to examine a Python application in a useful way from gdb.

(This is a variation on my question about debugging memleaks in a production system.)

Asked By: keturn

Source

Answers:

There is no builtin way other than aborting (with os.abort(), causing the coredump if resource limits allow it) — although you can certainly build your own ‘dump’ function that dumps relevant information about the data you care about. There are no ready-made tools for it.

As for handling the corefile of a Python process, the Python source has a gdbinit file that contains useful macros. It’s still a lot more painful than somehow getting into the process itself (with pdb or the interactive interpreter) but it makes life a little easier.

Answered By: Thomas Wouters

Someone above said that there is no builtin way to perform this, but that’s not entirely true. For an example, you could take a look at the pylons debugging tools. Whene there is an exception, the exception handler saves the stack trace and prints a URL on the console that can be used to retrieve the debugging session over HTTP.

While they’re probably keeping these sessions in memory, they’re just python objects, so there’s nothing to stop you from pickling a stack dump and restoring it later for inspection. It would mean some changes to the app, but it should be possible…

After some research, it turns out the relevant code is actually coming from Paste’s EvalException module. You should be able to look there to figure out what you need.

Answered By: Douglas Mayle

This answer suggests making your program core dump and then continuing execution on another sufficiently similar box.

Answered By: Alex Coventry

It’s also possible to write something that would dump all the data from the process, e.g.

Pickler that ignores the objects it can’t pickle (replacing them with something else) (e.g. Python: Pickling a dict with some unpicklable items)
Method that recursively converts everything into serializable stuff (e.g. this, except it needs a check for infinitely recursing objects and do something with those; also it could try dir() and getattr() to process some of the unknown objects, e.g. extension classes).

But leaving a running process with manhole or pylons or something like that certainly seems more convenient when possible.

(also, I wonder if something more convenient was written since this question was first asked)

Answered By: HoverHell

If you only care about storing the traceback object (which is all you need to start a debugging session), you can use debuglater (a fork of pydump). It works with recent versions of Python and has a IPython/Jupyter integration.

If you want to store the entire session, look at dill. It has a dump_session, and load_session functions.

Here are two other relevant projects:

If you’re looking for a language agnostic solution, you want to create a core dump file. Here’s an example with Python.

Answered By: Edu