Why does list ask about __len__?

Question:

class Foo:
    def __getitem__(self, item):
        print('getitem', item)
        if item == 6:
            raise IndexError
        return item**2
    def __len__(self):
        print('len')
        return 3

class Bar:
    def __iter__(self):
        print('iter')
        return iter([3, 5, 42, 69])
    def __len__(self):
        print('len')
        return 3

Demo:

>>> list(Foo())
len
getitem 0
getitem 1
getitem 2
getitem 3
getitem 4
getitem 5
getitem 6
[0, 1, 4, 9, 16, 25]
>>> list(Bar())
iter
len
[3, 5, 42, 69]

Why does list call __len__? It doesn’t seem to use the result for anything obvious. A for loop doesn’t do it. This isn’t mentioned anywhere in the iterator protocol, which just talks about __iter__ and __next__.

Is this Python reserving space for the list in advance, or something clever like that?

(CPython 3.6.0 on Linux)

Asked By: wim

||

Answers:

list is a list object constructor that will allocate an initial slice of memory for its contents. The list constructor attempts to figure out a good size for that initial slice of memory by checking the length hint or the length of any object passed into the constructor . See the call to PyObject_LengthHint in the Python source here. This place is called from the list constructor — list_init

If your object has no __len__ or __length_hint__, that’s OK — a default value of 8 is used; it just may be less efficient due to reallocations.

Answered By: Eli Bendersky

See the Rationale section from PEP 424 that introduced __length_hint__ and offers insight on the motivation:

Being able to pre-allocate lists based on the expected size, as estimated by __length_hint__ , can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present.

In addition to that, the documentation for object.__length_hint__ verifies the fact that this is purely an optimization feature:

Called to implement operator.length_hint(). Should return an estimated length for the object (which may be greater or less than the actual length). The length must be an integer >= 0. This method is purely an optimization and is never required for correctness.

So __length_hint__ is here because it can result in some nice optimizations.

PyObject_LengthHint, first tries to get a value from object.__len__ (if it is defined) and then tries to see if object.__length_hint__ is available. If neither is there, it returns a default value of 8 for lists.

listextend, which is called from list_init as Eli stated in his answer, was modified according to this PEP to offer this optimization for anything that defines either a __len__ or a __length_hint__.

list isn’t the only one that benefits from this, of course, bytes objects do:

>>> bytes(Foo())
len
getitem 0
...
b'x00x01x04tx10x19'

so do bytearray objects but, only when you extend them:

>>> bytearray().extend(Foo())
len
getitem 0
...

and tuple objects which create an intermediary sequence to populate themselves:

>>> tuple(Foo())
len
getitem 0
...
(0, 1, 4, 9, 16, 25)

If anybody is wandering why exactly 'iter' is printed before 'len' in class Bar and not after as happens with class Foo:

This is because if the object in hand defines an __iter__ Python will first call it to get the iterator, thereby running the print('iter') too. The same doesn’t happen if it falls back to using __getitem__.

Note: I prepared the answer for [SO]: Why __len__ is called and the result is not used when iterating with __getitem__?, which was marked as a dupe (as it’s exactly this question) while I was writing it, so it was no longer possible to post it there, and since I already had it, I decided to post it here (with small adjustments).

Here’s a modified version of your code that makes things a bit clearer.

code00.py:

#!/usr/bin/env python3

import sys


class Foo:
    def __getitem__(self, item):
        print("{0:s}.{1:s}: {2:d}".format(self.__class__.__name__, "getitem", item))
        if item == 6:
            raise IndexError
        return item ** 2


class Bar:
    def __iter__(self):
        print("{0:s}.{1:s}".format(self.__class__.__name__, "iter"))
        return iter([3, 5, 42, 69])

    def __len__(self):
        result = 3
        print("{0:s}.{1:s}: {2:d}".format(self.__class__.__name__, "len", result))
        return result


def main():
    print("Start ...n")
    for class_obj in [Foo, Bar]:
        inst_obj = class_obj()
        print("Created {0:s} instance".format(class_obj.__name__))
        list_obj = list(inst_obj)
        print("Converted instance to list")
        print("{0:s}: {1:}n".format(class_obj.__name__, list_obj))


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}n".format(" ".join(item.strip() for item in sys.version.split("n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main()
    print("nDone.")

Output:

[cfati@CFATI-5510-0:e:WorkDevStackOverflowq041474829]> "e:WorkDevVEnvspy_064_03.07.03_test0Scriptspython.exe" code00.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32

Start ...

Created Foo instance
Foo.getitem: 0
Foo.getitem: 1
Foo.getitem: 2
Foo.getitem: 3
Foo.getitem: 4
Foo.getitem: 5
Foo.getitem: 6
Converted instance to list
Foo: [0, 1, 4, 9, 16, 25]

Created Bar instance
Bar.iter
Bar.len: 3
Converted instance to list
Bar: [3, 5, 42, 69]


Done.

As seen, __len__ is called when the list is constructed. Browsing [GitHub]: python/cpython – (master) cpython/Objects/listobject.c:

  • list___init__ (which is the initializer: __init__ (tp_init member in PyList_Type)) calls list___init___impl
  • list___init___impl calls list_extend
  • list_extend calls PyObject_LengthHint (n = PyObject_LengthHint(iterable, 8);)
  • PyObject_LengthHint (in abstract.c), does the check:

    Py_ssize_t
    PyObject_LengthHint(PyObject *o, Py_ssize_t defaultvalue)
    
        // ...
    
        if (_PyObject_HasLen(o)) {
            res = PyObject_Length(o);
    
        // ...
    

So, it’s an optimization feature that works for iterables that define __len__.

This is particularly handy when the iterable has a large number of elements, so that they are allocated at once, and therefore skip the list growth mechanism (didn’t check if still applies, but at one point, it was): “Space increases by ~12.5% when full” (according to David M. Beazley). It is very useful when lists were constructed out of (other) lists or tuples.
For example, constructing a list from an iterable (that doesn’t define __len__) with 1000 elements, instead of allocating everything at once, there will be ~41 (log1.125(1000 / 8)) operations (allocation, data shifting, deallocation) required only for increasing the new list as it gets filled (with elements from the source iterable).

Needless to say that for “modern” iterables, the improvement no longer applies.

Answered By: CristiFati
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.