How does Python's Garbage Collector Detect Circular References?

Question:

I’m trying to understand how Python’s garbage collector detects circular references. When I look at the documentation, all I see is a statement that circular references are detected, except when the objects involved have a __del__ method.

If this happens, my understanding (possibly faulty) is that the gc module acts as a failsafe by (I assume) walking through all the allocated memory and freeing any unreachable blocks.

How does Python detect & free circular memory references before making use of the gc module?

Asked By: user1245262

||

Answers:

How does Python detect & free circular memory references before making use of the gc module?

It doesn’t. The gc exists only to detect and free circular references. Non-circular references are handled through refcounting.

Now, to see how gc determines the set of objects referenced by any given object, take a look at the gc_get_references function in Modules/gcmodule.c. The relevant bit is:

// Where `obj` is the object who's references we want to find
traverseproc traverse;
if (! PyObject_IS_GC(obj))
    continue;
traverse = Py_TYPE(obj)->tp_traverse;
if (! traverse)
    continue;
if (traverse(obj, (visitproc)referentsvisit, result)) {
    Py_DECREF(result);
    return NULL;
}

The major function here is tp_traverse. Each C-level type defines a tp_traverse function (or in the case of objects which don’t hold any references, like str, sets it to NULL). One example of tp_traverse is list_traverse, the traversal function for list:

static int
list_traverse(PyListObject *o, visitproc visit, void *arg)
{
    Py_ssize_t i;

    for (i = Py_SIZE(o); --i >= 0; )
        Py_VISIT(o->ob_item[i]);
    return 0;
}

I see is a statement that circular references are detected, except when the objects involved have a __del__() method.

You are correct — Python’s cycle detector can detect and collect cycles unless they contain objects with a __del__ method, as there is no way for the interpreter to safely delete these objects (to get an intuition on why this is, imagine you’ve got two objects with __del__ methods that reference each other. In which order should they be freed?).

When objects with a __del__ method are involved in a cycle, the garbage collector will stick them in a separate list (accessible through gc.garbage) so that the programmer can manually “deal with” them.

Answered By: David Wolever

How does Python detect & free circular memory references before making use of the gc module?

Python’s garbage collector (not actually the gc module, which is just the Python interface to the garbage collector) does this. So, Python doesn’t detect and free circular memory references before making use of the garbage collector.

Python ordinarily frees most objects as soon as their reference count reaches zero. (I say “most” because it never frees, for example, small integers or interned strings.) In the case of circular references, this never happens, so the garbage collector periodically walks memory and frees circularly-referenced objects.

This is all CPython-specific, of course. Other Python implementations have different memory management (Jython = Java VM, IronPython = Microsoft .NET CLR).

Answered By: kindall

I think I found the answer I’m looking for in some links provided by @SvenMarnich in comments to the original question:

Container objects are Python objects that can hold references to other Python objects. Lists, Classes, Tuples etc are container objects; Integers, Strings etc. are not. So, only container objects are at risk for being in a circular reference.

Each Python object has a field – *gc_ref*, which is (I believe) set to NULL for non-container objects. For container objects it is set equal to the number of non container objects that reference it

Any container object with a *gc_ref* count greater than 1 (? I would’ve thought 0, but OK for now ?) has references that are not container objects. So they are reachable and are removed from consideration of being unreachable memory islands.

Any container object reachable by an object known to be reachable (i.e. those we just recognized as having a *gc_ref* count greater than 1) also does not need to be freed.

The remaining container objects are not reachable (except by each other) and should be freed.

http://www.arctrix.com/nas/python/gc/ is a link providing a fuller explanation
http://hg.python.org/cpython/file/2059910e7d76/Modules/gcmodule.c is a link to the source code, which has comments further explaining the thoughts behind the circular reference detection

Answered By: user1245262