python generators garbage collection

Question:

I think my question is related to this, but not exactly similar. Consider this code:

def countdown(n):
    try:
        while n > 0:
            yield n
            n -= 1
    finally:
        print('In the finally block')

def main():
    for n in countdown(10):
        if n == 5:
            break
        print('Counting... ', n)
    print('Finished counting')

main()

The output of this code is:

Counting...  10      
Counting...  9       
Counting...  8       
Counting...  7       
Counting...  6       
In the finally block 
Finished counting  

Is it guaranteed that the line “In the finally block” is going to be printed before “Finished counting”? Or is this because of cPython implementation detail that an object will be garbage collected when the reference count reaches 0.

Also I am curious on how finally block of the countdown generator is executed? e.g. if I change the code of main to

def main():
    c = countdown(10)
    for n in c:
        if n == 5:
            break
        print('Counting... ', n)
    print('Finished counting')

then I do see Finished counting printed before In the finally block. How does the garbage collector directly go to the finally block? I think I have always taken try/except/finally on its face value, but thinking in the context of generators is making me think twice about it.

Asked By: skgbanga

||

Answers:

You are, as you expected, relying on implementation-specific behavior of CPython reference counting.1

In fact, if you run this code in, say, PyPy, the output will usually be:

Counting...  10
Counting...  9
Counting...  8
Counting...  7
Counting...  6
Finished counting
In the finally block

And if you run it in an interactive PyPy session, that last line may come many lines later, or even only when you finally exit.


If you look at how generators are implemented, they have methods roughly like this:

def __del__(self):
    self.close()
def close(self):
    try:
        self.raise(GeneratorExit)
    except GeneratorExit:
        pass

CPython deletes objects immediately when the reference count becomes zero (it also has a garbage collector to break up cyclic references, but that isn’t relevant here). As soon as the generator goes out of scope, it gets deleted, so it gets closed, so it raises a GeneratorExit into the generator frame and resumes it. And of course there’s no handler for the GeneratorExit, so the finally clause gets executed and control passes up the stack, where the exception is swallowed.

In PyPy, which uses a hybrid garbage collector, the generator doesn’t get deleted until the next time the GC decides to scan. And in an interactive session, with low memory pressure, that could be as late as exit time. But once it does, the same thing happens.

You can see this by handling the GeneratorExit explicitly:

def countdown(n):
    try:
        while n > 0:
            yield n
            n -= 1
    except GeneratorExit:
        print('Exit!')
        raise
    finally:
        print('In the finally block')

(If you leave the raise off, you’ll get the same results for only slightly different reasons.)


You can explicitly close a generator—and, unlike the stuff above, this is part of the public interface of the generator type:

def main():
    c = countdown(10)
    for n in c:
        if n == 5:
            break
        print('Counting... ', n)
    c.close()
    print('Finished counting')

Or, of course, you can use a with statement:

def main():
    with contextlib.closing(countdown(10)) as c:
        for n in c:
            if n == 5:
                break
            print('Counting... ', n)
    print('Finished counting')

1. As Tim Peters’ answer points out, you’re also relying of implementation-specific behavior of the CPython compiler in the second test.

Answered By: abarnert

I endorse @abarnert’s answer, but since I already typed this …

Yes, the behavior in your first example is an artifact of CPython‘s referencing counting. When you break out of the loop, the anonymous generator-iterator object countdown(10) returned loses its last reference, and so is garbage-collected at once. That in turn triggers the generator’s finally: suite.

In your second example, the generator-iterator remains bound to c until your main() exits, so as far as CPython knows you may resume c at any time. It’s not “garbage” until main() exits. A fancier compiler could notice that c is never referenced after the loop ends, and decide to effectively del c before then, but CPython makes no attempt to predict the future. All local names remain bound until you explicitly unbind them yourself, or the scope in which they’re local ends.

Answered By: Tim Peters