Where is Python's shutdown procedure setting module globals to None documented?

Question:

CPython has a strange behaviour where it sets modules to None during shutdown. This screws up error logging during shutdown of some multithreading code I’ve written.

I can’t find any documentation of this behaviour. It’s mentioned in passing in PEP 432:

[…] significantly reducing the number of modules that will experience the “module globals set to None” behaviour that is used to deliberate break cycles and attempt to releases more external resources cleanly.

There are SO questions about this behaviour and the C API documentation mentions shutdown behaviour for embedded interpreters.

I’ve also found a related thread on python-dev and a related CPython bug:

This patch does not change the behavior of module
objects clearing their globals dictionary as soon as
they are deallocated.

Where is this behaviour documented? Is it Python 2 specific?

Asked By: Wilfred Hughes

||

Answers:

The behaviour is not well documented, and is present in all versions of Python from about 1.5-ish until Python 3.4:

As part of this change, module globals are no longer forcibly set to None during interpreter shutdown in most cases, instead relying on the normal operation of the cyclic garbage collector.

The only documentation for the behaviour is the moduleobject.c source code:

/* To make the execution order of destructors for global
   objects a bit more predictable, we first zap all objects
   whose name starts with a single underscore, before we clear
   the entire dictionary.  We zap them by replacing them with
   None, rather than deleting them from the dictionary, to
   avoid rehashing the dictionary (to some extent). */

Note that setting the values to None is an optimisation; the alternative would be to delete names from the mapping, which would lead to different errors (NameError exceptions rather than AttributeErrors when trying to use globals from a __del__ handler).

As you found out on the mailinglist, the behaviour predates the cyclic garbage collector; it was added in 1998, while the cyclic garbage collector was added in 2000. Since function objects always reference the module __dict__ all function objects in a module involve circular references, which is why the __dict__ needed clearing before GC came into play.

It was kept in place even when cyclic GC was added, because there might be objects with __del__ methods involved in cycles. These aren’t otherwise garbage-collectable, and cleaning out the module dictionary would at least remove the module __dict__ from such cycles. Not doing that would keep all referenced globals of that module alive.

The changes made for PEP 442 now make it possible for the garbage collector to clear cyclic references with objects that provide a __del__ finalizer, removing the need to clear the module __dict__ for most cases. The code is still there but this is only triggered if the __dict__ attribute is still alive even after moving the contents of sys.modules to weak references and starting a GC collection run when the interpreter is shutting down; the module finalizer simply decrements their reference count.

Answered By: Martijn Pieters

There is a small amount of related documentation at the bottom of the threading docs:

Secondly, all import attempts must be completed before the interpreter starts shutting itself down. [..] Failure to abide by this restriction will lead to intermittent exceptions and crashes during interpreter shutdown (as the late imports attempt to access machinery which is no longer in a valid state).

Answered By: Wilfred Hughes