List extend with a generator expression referencing the list itself

Question:

Consider the code below:

a = [1, 2]
a.extend(x*2 for x in a)
print(a)

At first glance, it would seem (to the uninitiated) that the code will generate the following output:

[1, 2, 2, 4]

But it does NOT generate that output. In fact, the program never ends – it is an infinite program. You need to convert that generator expression to a list first. Like so:

a = [1, 2]
a.extend([x*2 for x in a])
print(a)

Question is, why is it an infinite program when using the first approach? I understand that in order to execute extend, it must first evaluate the generator expression, which in turn references the original list. But here’s how I would assume it works (or should work):

  1. extend has a generator expression as the argument, so it goes ahead and evaluates the expression first
  2. generator expression is evaluated to [2, 4]
  3. now extend the original list a with [2, 4]

But looks like that’s not what actually happens.

Can anyone explain what is going on, and describe why it behaves the way it does?

Asked By: Sнаđошƒаӽ

||

Answers:

The first line is straightforward, a is to be a list [1, 2].

a = [1, 2]

The second line causes the infinite loop:

a.extend(x*2 for x in a)

What is passed to a.extend() is the generator expression x*2 for x in a. This generator effectively does something like this:

for x in a:
    yield x*2

However, what would happen if you instead ran?:

for x in a:
    yield x*2
    a.append(x*2)

This would in fact run forever, since there will always be a next x to get from the list, which keeps growing.

And that’s exactly what happens here, since .extend() exhausts the generator one item at a time, adding the result to the list, before control is returned to the generator.

However, when you run:

a = [1, 2]
a.extend([x*2 for x in a])

The square brackets cause the generator to be exhausted entirely, into a list, which is then passed to .extend(), which adds the whole thing to a in one go, as it is a list and not some other iterable like a generator.

Answered By: Grismar

Short answer:

  1. generator expression is evaluated to [2, 4]

Is wrong. The generator expression is evaluated lazily, an item at a time, no such list is produced, and the items are appended one at a time, in a way that means the generator expression eventually starts reading its own output from later in the list as it goes.

Long answer:

Generator expressions are lazy. list.extend doesn’t need to completely evaluate the argument, it can just iterate it as it goes, appending each item as it goes. The internal implementation can be written entirely in terms of append, e.g.:

def extend(self, it):
    for x in it:
        self.append(x)

In practice, it’s a C version of the loop with some minor optimizations tossed in to reduce reallocations when possible and some optimizations/protections that special-case list, tuple and direct self-extends.

This is typically a good thing; the whole point of generator expressions is to avoid a wasteful temporary list, and forcibly converting from genexpr to list would make it slower and add additional unnecessary memory expenses. If it just converted to list then extended every time an operation like this occurred, genexprs would be mostly pointless. So they don’t.

The code does include protection against the simple case of:

mylist.extend(mylist)

but it doesn’t attempt to defend against cases like this where there’s indirection in the way. There’s no way to defend against it in general (it can’t trace through all the layers of indirection possible to determine it’s really self-assignment); when they come up, you’re responsible for doing it safely.

Answered By: ShadowRanger