How can we write a `__getitem__` method which accepts any iterable as input, and chains the items together?
Question:
How can we turn cookiejar[(1, 2, 3)]
into cookiejar[1][2][3]
?
What is the desired behavior?
The following two pieces of code (LEFT CODE
and RIGHT CODE
) should do the same thing when calling __getitem__
+----------------------+--------------------------+
| LEFT CODE | RIGHT CODE |
+----------------------+--------------------------+
| cjar = CookieJar() | cjar = CookieJar() |
| result = cjar[index] | indices = [1, 2, 3] |
| | indices = iter(index) |
| | index = next(it) |
| | result = cjar[index][it] |
+----------------------+--------------------------+
More examples are shown below. The code in the column at left should exhibit the same outward behavior as the code in the column at right.
+----------------------------+-------------------------------+
| cookie_jar[1][2][3] | cookie_jar[(1, 2, 3)] |
+----------------------------+-------------------------------+
| cookie_jar[x][y] | cookie_jar[(x, y)] |
+----------------------------+-------------------------------+
| cookie_jar[99] | cookie_jar[(99,)] |
+----------------------------+-------------------------------+
| cookie_jar[99] | cookie_jar[[[[99]]] |
+----------------------------+-------------------------------+
| cookie_jar[1][2][3] | cookie_jar[1, 2][3] |
+----------------------------+-------------------------------+
| cookie_jar[1][2][3] | cookie_jar[[1, [2]], [3]] |
+----------------------------+-------------------------------+
| cookie_jar[1][2][3] | cookie_jar[1, 2, 3] |
+----------------------------+-------------------------------+
| cookie_jar[3][11][19] | cookie_jar[3:20:8] |
+----------------------------+-------------------------------+
| cookie_jar[3][11][19] | cookie_jar[range(3, 20, 8)] |
+----------------------------+-------------------------------+
What is the difference between a single key/index and a container of keys or indices?
If you try to convert table["hello world"]
into table['h']['e']['l']['l']['o']... ['l']['d']
you can easily create an infinite loop.
The following code never stops running:
def go_to_leaf(root):
while hasattr(root, '__iter__'):
root = iter(root)
root = next(root)
# BEGIN INFINITE LOOP!
left_most_leaf = go_to_leaf("hello world")
Should use something like this instead:
def is_leaf(cls, knode):
"""
returns true if the input is a valid key (index)
into the container.
returns false if the input is a container of keys
or is an invalid key
"""
if hasattr(knode, "__iter__"):
return str(knode) == "".join(str(elem) for elem in knode)
else: # not iterable
return True
If you had a 3-dimentional table of numbers it would not matter if x-y coordinates were inside of a single tuple or list, or used separately.
element = table[2][7][3]
element = table[2, 7, 3]
element = table[(2, 7, 3)]
Answers:
Basic idea
Instead of making a separate container type, make a view for containers. The semantics are:
-
A view instance tracks some iterable (which might be an element of some other iterable). For simplicity, we won’t bother checking whether it’s a proper container type or lazily evaluated.
-
When the view is indexed with a value of a non-iterable type, it indexes into the container with that value.
-
When the view is indexed with a value of an iterable type, it repeats the indexing for each element in that value.
-
If the result of the indexing is iterable, the result is a view around that iterable. Otherwise, the result is the value itself.
It can be implemented quite simply:
class View:
def __init__(self, data):
self._data = data
def __getitem__(self, indices):
result = self._data
# We can't easily distinguish a `TypeError` due to `indices`
# being a non-iterable, from a `TypeError` due to reaching a
# leaf in the data prematurely. So we explicitly check first.
try:
iter(indices)
except TypeError:
result = result[indices]
else:
for i in indices:
result = result[i]
# Now decide whether to wrap the result
try:
iter(result)
except TypeError:
return result
else:
return View(result)
As a refactoring, we could use __new__
rather than __init__
so that the argument is returned unchanged if it isn’t iterable. That prevents explicitly creating bad Views, and can also simplify the __getitem__
logic:
class View:
def __new__(cls, data):
try:
iter(data)
result = object.__new__(cls)
result._data = data
except TypeError:
result = data
return result
def __getitem__(self, indices):
result = self._data
try:
iter(indices)
except TypeError:
result = result[indices]
else:
for i in indices:
result = result[i]
return View(result)
Special cases
There are two problems with this result compared to the specification:
-
slice
objects are not actually iterable. We want to interpret myview[3:20:8]
as if it were actually being indexed with the values described by that range, in sequence. Fortunately, it is trivial to convert a slice
into the corresponding range
object with the same start
, stop
and step
.
However, we need to complain if the start
or stop
are unspecified, since otherwise the semantics don’t make any sense; and we have to keep in mind that ranges don’t accept None
as a step value (slice
s treat it as equivalent to 1
). Finally, we have to accept that negative values will not index from the end, since again it will be far too difficult to interpret what should happen for all the corner cases.
-
Strings (and possibly other types) are iterable, and the elements are themselves non-empty strings – thus they can be indexed into arbitrarily many times. We need to special-case these in order for them to work as leaf nodes.
We need helper logic to treat strings as if they were not iterable. It should apply to construction, too (since otherwise we could make a totally useless View
instance from a string). We don’t want that logic to handle slices, because View(slice(0))
should give us the original slice
back, not a range
.
With some refactoring, we get:
def _make_range(a_slice):
start, stop, step = a_slice.start, a_slice.stop, a_slice.step
if start is None or stop is None:
raise ValueError('start and stop must be defined to convert to range')
return range(start, stop, 1 if step is None else step)
def _non_string_iterable(obj):
try:
iter(data)
return not isinstance(obj, str)
except TypeError:
return False
class View:
def __new__(cls, data):
if _non_string_iterable(data):
result = object.__new__(cls)
result._data = data
return result
return data
def __getitem__(self, indices):
result = self._data
if isinstance(indices, slice):
indices = _make_range(indices)
if _non_string_iterable(indices):
for i in indices:
result = result[i]
else:
result = result[indices]
return View(result)
Combining collapse()
and a Python version of dig()
, with special slice
handling, reproduces your input table of examples:
from more_itertools import collapse # or implement this yourself
from unittest.mock import MagicMock
def dig(collection, *keys):
"""Dig into nested subscriptable objects, e.g. dict and list, i.e JSON."""
curr = collection
for k in keys:
if curr is None:
break
if not hasattr(curr, '__getitem__') or isinstance(curr, str):
raise TypeError(f'cannot dig into {type(curr)}')
try:
curr = curr[k]
except (KeyError, IndexError):
curr = None
return curr
def what_you_wanted(collection, *keys): # If I understood you correctly
slic = keys[0] if len(keys) == 1 and isinstance(keys[0], slice) else None
dig_keys = range(slic.stop)[slic] if slic else collapse(keys)
return dig(collection, *dig_keys)
def test_getitem_with(*keys):
mock = MagicMock()
mock.__getitem__.returns = mock
what_you_wanted(mock, *keys)
print(mock.mock_calls)
test_getitem_with((1, 2, 3))
test_getitem_with(('x', 'y'))
test_getitem_with((99,))
test_getitem_with([[[99]]])
test_getitem_with((1, 2), 3)
test_getitem_with(([1, [2]], [3]))
test_getitem_with(1, 2, 3)
test_getitem_with(slice(3, 20, 8))
test_getitem_with(range(3, 20, 8))
Prints:
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__('x'), call.__getitem__().__getitem__('y')]
[call.__getitem__(99)]
[call.__getitem__(99)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(3),
call.__getitem__().__getitem__(11),
call.__getitem__().__getitem__().__getitem__(19)]
[call.__getitem__(3),
call.__getitem__().__getitem__(11),
call.__getitem__().__getitem__().__getitem__(19)]
For completion, could define a collection object (or View) that implements __getitem__()
using what_you_wanted()
.
How can we turn cookiejar[(1, 2, 3)]
into cookiejar[1][2][3]
?
What is the desired behavior?
The following two pieces of code (LEFT CODE
and RIGHT CODE
) should do the same thing when calling __getitem__
+----------------------+--------------------------+
| LEFT CODE | RIGHT CODE |
+----------------------+--------------------------+
| cjar = CookieJar() | cjar = CookieJar() |
| result = cjar[index] | indices = [1, 2, 3] |
| | indices = iter(index) |
| | index = next(it) |
| | result = cjar[index][it] |
+----------------------+--------------------------+
More examples are shown below. The code in the column at left should exhibit the same outward behavior as the code in the column at right.
+----------------------------+-------------------------------+
| cookie_jar[1][2][3] | cookie_jar[(1, 2, 3)] |
+----------------------------+-------------------------------+
| cookie_jar[x][y] | cookie_jar[(x, y)] |
+----------------------------+-------------------------------+
| cookie_jar[99] | cookie_jar[(99,)] |
+----------------------------+-------------------------------+
| cookie_jar[99] | cookie_jar[[[[99]]] |
+----------------------------+-------------------------------+
| cookie_jar[1][2][3] | cookie_jar[1, 2][3] |
+----------------------------+-------------------------------+
| cookie_jar[1][2][3] | cookie_jar[[1, [2]], [3]] |
+----------------------------+-------------------------------+
| cookie_jar[1][2][3] | cookie_jar[1, 2, 3] |
+----------------------------+-------------------------------+
| cookie_jar[3][11][19] | cookie_jar[3:20:8] |
+----------------------------+-------------------------------+
| cookie_jar[3][11][19] | cookie_jar[range(3, 20, 8)] |
+----------------------------+-------------------------------+
What is the difference between a single key/index and a container of keys or indices?
If you try to convert table["hello world"]
into table['h']['e']['l']['l']['o']... ['l']['d']
you can easily create an infinite loop.
The following code never stops running:
def go_to_leaf(root):
while hasattr(root, '__iter__'):
root = iter(root)
root = next(root)
# BEGIN INFINITE LOOP!
left_most_leaf = go_to_leaf("hello world")
Should use something like this instead:
def is_leaf(cls, knode):
"""
returns true if the input is a valid key (index)
into the container.
returns false if the input is a container of keys
or is an invalid key
"""
if hasattr(knode, "__iter__"):
return str(knode) == "".join(str(elem) for elem in knode)
else: # not iterable
return True
If you had a 3-dimentional table of numbers it would not matter if x-y coordinates were inside of a single tuple or list, or used separately.
element = table[2][7][3]
element = table[2, 7, 3]
element = table[(2, 7, 3)]
Basic idea
Instead of making a separate container type, make a view for containers. The semantics are:
-
A view instance tracks some iterable (which might be an element of some other iterable). For simplicity, we won’t bother checking whether it’s a proper container type or lazily evaluated.
-
When the view is indexed with a value of a non-iterable type, it indexes into the container with that value.
-
When the view is indexed with a value of an iterable type, it repeats the indexing for each element in that value.
-
If the result of the indexing is iterable, the result is a view around that iterable. Otherwise, the result is the value itself.
It can be implemented quite simply:
class View:
def __init__(self, data):
self._data = data
def __getitem__(self, indices):
result = self._data
# We can't easily distinguish a `TypeError` due to `indices`
# being a non-iterable, from a `TypeError` due to reaching a
# leaf in the data prematurely. So we explicitly check first.
try:
iter(indices)
except TypeError:
result = result[indices]
else:
for i in indices:
result = result[i]
# Now decide whether to wrap the result
try:
iter(result)
except TypeError:
return result
else:
return View(result)
As a refactoring, we could use __new__
rather than __init__
so that the argument is returned unchanged if it isn’t iterable. That prevents explicitly creating bad Views, and can also simplify the __getitem__
logic:
class View:
def __new__(cls, data):
try:
iter(data)
result = object.__new__(cls)
result._data = data
except TypeError:
result = data
return result
def __getitem__(self, indices):
result = self._data
try:
iter(indices)
except TypeError:
result = result[indices]
else:
for i in indices:
result = result[i]
return View(result)
Special cases
There are two problems with this result compared to the specification:
-
slice
objects are not actually iterable. We want to interpretmyview[3:20:8]
as if it were actually being indexed with the values described by that range, in sequence. Fortunately, it is trivial to convert aslice
into the correspondingrange
object with the samestart
,stop
andstep
.However, we need to complain if the
start
orstop
are unspecified, since otherwise the semantics don’t make any sense; and we have to keep in mind that ranges don’t acceptNone
as a step value (slice
s treat it as equivalent to1
). Finally, we have to accept that negative values will not index from the end, since again it will be far too difficult to interpret what should happen for all the corner cases. -
Strings (and possibly other types) are iterable, and the elements are themselves non-empty strings – thus they can be indexed into arbitrarily many times. We need to special-case these in order for them to work as leaf nodes.
We need helper logic to treat strings as if they were not iterable. It should apply to construction, too (since otherwise we could make a totally useless View
instance from a string). We don’t want that logic to handle slices, because View(slice(0))
should give us the original slice
back, not a range
.
With some refactoring, we get:
def _make_range(a_slice):
start, stop, step = a_slice.start, a_slice.stop, a_slice.step
if start is None or stop is None:
raise ValueError('start and stop must be defined to convert to range')
return range(start, stop, 1 if step is None else step)
def _non_string_iterable(obj):
try:
iter(data)
return not isinstance(obj, str)
except TypeError:
return False
class View:
def __new__(cls, data):
if _non_string_iterable(data):
result = object.__new__(cls)
result._data = data
return result
return data
def __getitem__(self, indices):
result = self._data
if isinstance(indices, slice):
indices = _make_range(indices)
if _non_string_iterable(indices):
for i in indices:
result = result[i]
else:
result = result[indices]
return View(result)
Combining collapse()
and a Python version of dig()
, with special slice
handling, reproduces your input table of examples:
from more_itertools import collapse # or implement this yourself
from unittest.mock import MagicMock
def dig(collection, *keys):
"""Dig into nested subscriptable objects, e.g. dict and list, i.e JSON."""
curr = collection
for k in keys:
if curr is None:
break
if not hasattr(curr, '__getitem__') or isinstance(curr, str):
raise TypeError(f'cannot dig into {type(curr)}')
try:
curr = curr[k]
except (KeyError, IndexError):
curr = None
return curr
def what_you_wanted(collection, *keys): # If I understood you correctly
slic = keys[0] if len(keys) == 1 and isinstance(keys[0], slice) else None
dig_keys = range(slic.stop)[slic] if slic else collapse(keys)
return dig(collection, *dig_keys)
def test_getitem_with(*keys):
mock = MagicMock()
mock.__getitem__.returns = mock
what_you_wanted(mock, *keys)
print(mock.mock_calls)
test_getitem_with((1, 2, 3))
test_getitem_with(('x', 'y'))
test_getitem_with((99,))
test_getitem_with([[[99]]])
test_getitem_with((1, 2), 3)
test_getitem_with(([1, [2]], [3]))
test_getitem_with(1, 2, 3)
test_getitem_with(slice(3, 20, 8))
test_getitem_with(range(3, 20, 8))
Prints:
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__('x'), call.__getitem__().__getitem__('y')]
[call.__getitem__(99)]
[call.__getitem__(99)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(3),
call.__getitem__().__getitem__(11),
call.__getitem__().__getitem__().__getitem__(19)]
[call.__getitem__(3),
call.__getitem__().__getitem__(11),
call.__getitem__().__getitem__().__getitem__(19)]
For completion, could define a collection object (or View) that implements __getitem__()
using what_you_wanted()
.