Why do two identical lists have a different memory footprint?

Question:

I created two lists l1 and l2, but each one with a different creation method:

import sys

l1 = [None] * 10
l2 = [None for _ in range(10)]

print('Size of l1 =', sys.getsizeof(l1))
print('Size of l2 =', sys.getsizeof(l2))

But the output surprised me:

Size of l1 = 144
Size of l2 = 192

The list created with a list comprehension is a bigger size in memory, but the two lists are identical in Python otherwise.

Why is that? Is this some CPython internal thing, or some other explanation?

Asked By: Andrej Kesely

||

Answers:

When you write [None] * 10, Python knows that it will need a list of exactly 10 objects, so it allocates exactly that.

When you use a list comprehension, Python doesn’t know how much it will need. So it gradually grows the list as elements are added. For each reallocation it allocates more room than is immediately needed, so that it doesn’t have to reallocate for each element. The resulting list is likely to be somewhat bigger than needed.

You can see this behavior when comparing lists created with similar sizes:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

You can see that the first method allocates just what is needed, while the second one grows periodically. In this example, it allocates enough for 16 elements, and had to reallocate when reaching the 17th.

Answered By: interjay

As noted in this question the list-comprehension uses list.append under the hood, so it will call the list-resize method, which overallocates.

To demonstrate this to yourself, you can actually use the dis dissasembler:

>>> code = compile('[x for x in iterable]', '', 'eval')
>>> import dis
>>> dis.dis(code)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10560b810, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x10560b810, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
>>>

Notice the LIST_APPEND opcode in the disassembly of the <listcomp> code object. From the docs:

LIST_APPEND(i)

Calls list.append(TOS[-i], TOS). Used to implement list comprehensions.

Now, for the list-repetition operation, we have a hint about what is going on if we consider:

>>> import sys
>>> sys.getsizeof([])
64
>>> 8*10
80
>>> 64 + 80
144
>>> sys.getsizeof([None]*10)
144

So, it seems to be able to exactly allocate the size. Looking at the source code, we see this is exactly what happens:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);

Namely, here: size = Py_SIZE(a) * n;. The rest of the functions simply fills the array.

Answered By: juanpa.arrivillaga

None is a block of memory, but it is not a pre-specified size. In addition to that, there is some extra spacing in an array between array elements. You can see this yourself by running:

for ele in l2:
    print(sys.getsizeof(ele))

>>>>16
16
16
16
16
16
16
16
16
16

Which does not total the size of l2, but rather is less.

print(sys.getsizeof([None]))
72

And this is much greater than one tenth of the size of l1.

Your numbers should vary depending on both the details of your operating system and the details of current memory usage in your operating system. The size of [None] can never be bigger than the available adjacent memory where the variable is set to be stored, and the variable may have to be moved if it is later dynamically allocated to be larger.

Answered By: StevenJD