Python Slice Assignment Memory Usage

Question:

I read in a comment here on Stack Overflow that it is more memory efficient to do slice assignment when changing lists. For example,

a[:] = [i + 6 for i in a]

should be more memory efficient than

a = [i + 6 for i in a]

because the former replaces elements in the existing list, while the latter creates a new list and rebinds a to that new list, leaving the old a in memory until it can be garbage collected. Benchmarking the two for speed, the latter is slightly quicker:

$ python -mtimeit -s 'a = [1, 2, 3]' 'a[:] = [i + 6 for i in a]'
1000000 loops, best of 3: 1.53 usec per loop
$ python -mtimeit -s 'a = [1, 2, 3]' 'a = [i + 6 for i in a]'
1000000 loops, best of 3: 1.37 usec per loop

That is what I’d expect, as rebinding a variable should be faster than replacing elements in a list. However, I can’t find any official documentation which supports the memory usage claim, and I’m not sure how to benchmark that.

On the face of it, the memory usage claim makes sense to me. However, giving it some more thought, I would expect that in the former method, the interpreter would create a new list from the list comprehension and then copy the values from that list to a, leaving the anonymous list in floating around until it is garbage collected. If that’s the case, then the former method would use the same amount of memory while also being slower.

Can anyone show definitively (with a benchmark or official documentation) which of the two methods is more memory efficient/which is the preferred method?

Thanks in advance.

Asked By: Mitch Lindgren

||

Answers:

The line

a[:] = [i + 6 for i in a]

would not save any memory. Python does evaluate the right hand side first, as stated in the language documentation:

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.

In the case at hand, the single resulting object would be a new list, and the single target in the target list would be a[:].

We could replace the list comprehension by a generator expression:

a[:] = (i + 6 for i in a)

Now, the right hand side evaluates to a generator instead of a list. Benchmarking shows that this is still slower than the naive

a = [i + 6 for i in a]

So does the generator expression actually save any memory? At first glance, you might think it does. But delving in to the source code of the function list_ass_slice() shows that it does not. The line

v_as_SF = PySequence_Fast(v, "can only assign an iterable");

uses PySequence_Fast() to convert the iterable (in this case the generator) into a tuple first, which is then copied into the old list. A tuple uses the same amount of memory as a list, so using a generator expression is basically the same as using a list comprehension in this case. During the last copy, the items of the original list are reused.

The moral seems to be that in this case the simplest approach is best in every regard.

Answered By: Sven Marnach