How do dunder methods __getitem__ and __len__ provide iteration?
Question:
I am reading about Python’s dunder methods. One of the things I learned is that if a class provides an implementation for __getitem__
and __len__
, it can be used in a for
loop.
Looking at the built-in classes like list
, tuple
, and range
I noticed that all of them provide an implementation for __iter__
which returns an iterator for a corresponding type. My understanding is that for
loop uses this iterator to traverse the elements.
However, how does it work for a class which provides __getitem__
and __len__
but not an __iter__
?
As an example, here’s a Range
class which mimics the in-built range
:
class Range():
def __init__(self, start, stop=None, step=1):
if step == 0:
raise ValueError('step cannot be 0')
if stop is None:
start, stop = 0, start
self._length = max(0, (stop - start + step - 1) // step)
self._start = start
self._step = step
def __len__(self):
return self._length
def __getitem__(self, k):
if k < 0:
k = len(self) + k
if not 0 <= k < self._length:
raise IndexError('Index out of range')
return self._start + (k * self._step)
Iterating over it with a for
loop:
In [21]: for elem in Range(5):
...: print(elem)
...:
0
1
2
3
4
Answers:
You can implement for x in y: ...
in two ways.
-
Rewrite as an infinite while
loop that calls next
explicitly.
itr = iter(y) # Using __iter__
while True:
try:
x = next(itr)
except StopIteration:
break
...
-
Rewrite as an iteration over range(len(y))
:
for i in range(len(y)): # Using __len__
x = y[i] # Using __getitem__
...
This relies on __getitem__
being defined for indices 0 through len(y) - 1
.
Update: as @Jonathon1609 reminds me, __len__
is not used. Instead, the for
loop requires __getitem__
to raise an IndexError
to terminate iteration.
i = 0
while True:
try:
x = y[i] # Uses __getitem__
except IndexError:
break
...
i += 1
reversed
is the function that can use __len__
and __getitem__
together if __reversed__
is not defined.
A iterable is a class which defines either __iter__
or __getitem__
, no need for __len__
.
The difference between the __iter__
implementation and the __getitem__
implementation is:
__iter__
calls __next__
on the object that returned from __iter__
(aka iterator), until it reaches StopIteration
and that’s where the for loop stops.
However __getitem__
, starts from zero (always), and each iteration it increments by one, until it reaches IndexError
, and it does that by obj[idx]
.
For instance:
class GetItem:
def __getitem__(self, idx):
if idx == 10:
raise IndexError
return idx
for i in GetItem():
print(i)
The result will be
0
1
2
...
9
because as soon as the index gets to 10, it raises IndexError
and the loop stops.
__iter__
on the other hand,
class Iter:
def __iter__(self):
self.n = 0
return self
def __next__(self):
self.n += 1
if self.n == 10:
raise StopIteration
return self.n
for i in Iter():
print(i)
Here, you need to keep track of the state by yourself, whereas in __getitem__
it does it by itself, it’s better for counting/indexing and such.
I am reading about Python’s dunder methods. One of the things I learned is that if a class provides an implementation for __getitem__
and __len__
, it can be used in a for
loop.
Looking at the built-in classes like list
, tuple
, and range
I noticed that all of them provide an implementation for __iter__
which returns an iterator for a corresponding type. My understanding is that for
loop uses this iterator to traverse the elements.
However, how does it work for a class which provides __getitem__
and __len__
but not an __iter__
?
As an example, here’s a Range
class which mimics the in-built range
:
class Range():
def __init__(self, start, stop=None, step=1):
if step == 0:
raise ValueError('step cannot be 0')
if stop is None:
start, stop = 0, start
self._length = max(0, (stop - start + step - 1) // step)
self._start = start
self._step = step
def __len__(self):
return self._length
def __getitem__(self, k):
if k < 0:
k = len(self) + k
if not 0 <= k < self._length:
raise IndexError('Index out of range')
return self._start + (k * self._step)
Iterating over it with a for
loop:
In [21]: for elem in Range(5):
...: print(elem)
...:
0
1
2
3
4
You can implement for x in y: ...
in two ways.
-
Rewrite as an infinite
while
loop that callsnext
explicitly.itr = iter(y) # Using __iter__ while True: try: x = next(itr) except StopIteration: break ...
-
Rewrite as an iteration over
range(len(y))
:for i in range(len(y)): # Using __len__ x = y[i] # Using __getitem__ ...
This relies on
__getitem__
being defined for indices 0 throughlen(y) - 1
.
Update: as @Jonathon1609 reminds me, __len__
is not used. Instead, the for
loop requires __getitem__
to raise an IndexError
to terminate iteration.
i = 0
while True:
try:
x = y[i] # Uses __getitem__
except IndexError:
break
...
i += 1
reversed
is the function that can use __len__
and __getitem__
together if __reversed__
is not defined.
A iterable is a class which defines either __iter__
or __getitem__
, no need for __len__
.
The difference between the __iter__
implementation and the __getitem__
implementation is:
__iter__
calls __next__
on the object that returned from __iter__
(aka iterator), until it reaches StopIteration
and that’s where the for loop stops.
However __getitem__
, starts from zero (always), and each iteration it increments by one, until it reaches IndexError
, and it does that by obj[idx]
.
For instance:
class GetItem:
def __getitem__(self, idx):
if idx == 10:
raise IndexError
return idx
for i in GetItem():
print(i)
The result will be
0
1
2
...
9
because as soon as the index gets to 10, it raises IndexError
and the loop stops.
__iter__
on the other hand,
class Iter:
def __iter__(self):
self.n = 0
return self
def __next__(self):
self.n += 1
if self.n == 10:
raise StopIteration
return self.n
for i in Iter():
print(i)
Here, you need to keep track of the state by yourself, whereas in __getitem__
it does it by itself, it’s better for counting/indexing and such.