Are all Python objects tracked by the garbage collector?

Question:

I’m trying to debug a memory leak (see question Memory leak in Python Twisted: where is it?).

When the garbage collector is running, does it have access to all Python objects created by the Python interpreter? If we suppose Python C libraries are not leaking, should RSS memory usage grow linearly with respect to the GC object count? What about sys.getobjects?

Asked By: Tommy

||

Answers:

The RSS does not grow linearly with the number of Python objects, because Python objects can vary in size. An int object is usually much smaller than a big list.

I suppose that you mean gc.get_objects when you wrote sys.getobjects. This function gives you a list of all reachable objects. If you suppose a leak, you can iterate this list and try to find objects that should have been freed already. (For instance you might know that all objects of a certain type are to be freed at a certain point.)

Answered By: Helmut Grohne

CPython uses two mechanisms to clean up garbage. One is reference counting, which affects all objects but which can’t clean up objects that (directly or indirectly) refer to each other. That’s where the actual garbage collector comes in: python has the gc module, which searches for cyclic references in objects it knows about. Only objects that can potentially be part of a reference cycle need to worry about participating in the cyclic gc. So, for example, lists do, but strings do not; strings don’t reference any other objects. (In fact, the story is a little more complicated, as there’s two ways of participating in cyclic gc, but that isn’t really relevant here.)

All Python classes (and instances thereof) automatically get tracked by the cyclic gc. Types defined in C aren’t, unless they put in a little effort. All the builtin types that could be part of a cycle do. But this does mean the gc module only knows about the types that play along.

Apart from the collection mechanism there’s also the fact that Python has its own aggregating memory allocator (obmalloc), which allocates entire memory arenas and uses the memory for most of the smaller objects it creates. Python now does free these arenas when they’re completely empty (for a long time it didn’t), but actually emptying an arena is fairly rare: because CPython objects aren’t movable, you can’t just move some stragglers to another arena.

Answered By: Thomas Wouters

A Python class designed to be unable to be involved in cycles is not tracked by the GC.

class V(object):
    __slots__ = ()

Instances of V cannot have any attribute. Its size is 16, like the size of object().

sys.getsizeof(V()) and v().sizeof() return the same value: 16.

V isn’t useful, but I imagine that other classes derived from base types (e.g. tuple), that only add methods, can be crafted so that reference counting is enough to manage them in memory.

Answered By: user3806