How can we write a `__getitem__` method which accepts any iterable as input, and chains the items together?

Question:

How can we turn cookiejar[(1, 2, 3)] into cookiejar[1][2][3]?

What is the desired behavior?

The following two pieces of code (LEFT CODE and RIGHT CODE) should do the same thing when calling __getitem__

+----------------------+--------------------------+
|      LEFT CODE       |        RIGHT CODE        |
+----------------------+--------------------------+
| cjar   = CookieJar() | cjar     = CookieJar()   |
| result = cjar[index] | indices  = [1, 2, 3]     |
|                      | indices  = iter(index)   |
|                      | index    = next(it)      |
|                      | result = cjar[index][it] |
+----------------------+--------------------------+

More examples are shown below. The code in the column at left should exhibit the same outward behavior as the code in the column at right.

+----------------------------+-------------------------------+
|  cookie_jar[1][2][3]       |  cookie_jar[(1, 2, 3)]        |
+----------------------------+-------------------------------+
|  cookie_jar[x][y]          |  cookie_jar[(x, y)]           |
+----------------------------+-------------------------------+
|  cookie_jar[99]            |  cookie_jar[(99,)]            |
+----------------------------+-------------------------------+
|  cookie_jar[99]            |  cookie_jar[[[[99]]]          |
+----------------------------+-------------------------------+
|  cookie_jar[1][2][3]       |  cookie_jar[1, 2][3]          |
+----------------------------+-------------------------------+
|  cookie_jar[1][2][3]       |  cookie_jar[[1, [2]], [3]]    |
+----------------------------+-------------------------------+
|  cookie_jar[1][2][3]       |  cookie_jar[1, 2, 3]          |
+----------------------------+-------------------------------+
|  cookie_jar[3][11][19]     |  cookie_jar[3:20:8]           |
+----------------------------+-------------------------------+
|  cookie_jar[3][11][19]     |  cookie_jar[range(3, 20, 8)]  |
+----------------------------+-------------------------------+

What is the difference between a single key/index and a container of keys or indices?

If you try to convert table["hello world"] into table['h']['e']['l']['l']['o']... ['l']['d'] you can easily create an infinite loop.

The following code never stops running:

def go_to_leaf(root):
    while hasattr(root, '__iter__'):
        root = iter(root)
        root = next(root)

# BEGIN INFINITE LOOP!
left_most_leaf = go_to_leaf("hello world")

Should use something like this instead:

def is_leaf(cls, knode):
    """
        returns true if the input is a valid key (index)
        into the container.

        returns false if the input is a container of keys
        or is an invalid key  
    """
    if hasattr(knode, "__iter__"):
        return str(knode) == "".join(str(elem) for elem in knode)
    else: # not iterable
        return True

If you had a 3-dimentional table of numbers it would not matter if x-y coordinates were inside of a single tuple or list, or used separately.

element = table[2][7][3]
element = table[2, 7, 3]
element = table[(2, 7, 3)]
Asked By: Samuel Muldoon

||

Answers:

Basic idea

Instead of making a separate container type, make a view for containers. The semantics are:

  • A view instance tracks some iterable (which might be an element of some other iterable). For simplicity, we won’t bother checking whether it’s a proper container type or lazily evaluated.

  • When the view is indexed with a value of a non-iterable type, it indexes into the container with that value.

  • When the view is indexed with a value of an iterable type, it repeats the indexing for each element in that value.

  • If the result of the indexing is iterable, the result is a view around that iterable. Otherwise, the result is the value itself.

It can be implemented quite simply:

class View:
    def __init__(self, data):
        self._data = data

    def __getitem__(self, indices):
        result = self._data
        # We can't easily distinguish a `TypeError` due to `indices`
        # being a non-iterable, from a `TypeError` due to reaching a 
        # leaf in the data prematurely. So we explicitly check first.
        try:
            iter(indices)
        except TypeError:
            result = result[indices]
        else:
            for i in indices:
                result = result[i]
        # Now decide whether to wrap the result
        try:
            iter(result)
        except TypeError:
            return result
        else:
            return View(result)

As a refactoring, we could use __new__ rather than __init__ so that the argument is returned unchanged if it isn’t iterable. That prevents explicitly creating bad Views, and can also simplify the __getitem__ logic:

class View:
    def __new__(cls, data):
        try:
            iter(data)
            result = object.__new__(cls)
            result._data = data
        except TypeError:
            result = data
        return result

    def __getitem__(self, indices):
        result = self._data
        try:
            iter(indices)
        except TypeError:
            result = result[indices]
        else:
            for i in indices:
                result = result[i]
        return View(result)

Special cases

There are two problems with this result compared to the specification:

  1. slice objects are not actually iterable. We want to interpret myview[3:20:8] as if it were actually being indexed with the values described by that range, in sequence. Fortunately, it is trivial to convert a slice into the corresponding range object with the same start, stop and step.

    However, we need to complain if the start or stop are unspecified, since otherwise the semantics don’t make any sense; and we have to keep in mind that ranges don’t accept None as a step value (slices treat it as equivalent to 1). Finally, we have to accept that negative values will not index from the end, since again it will be far too difficult to interpret what should happen for all the corner cases.

  2. Strings (and possibly other types) are iterable, and the elements are themselves non-empty strings – thus they can be indexed into arbitrarily many times. We need to special-case these in order for them to work as leaf nodes.

We need helper logic to treat strings as if they were not iterable. It should apply to construction, too (since otherwise we could make a totally useless View instance from a string). We don’t want that logic to handle slices, because View(slice(0)) should give us the original slice back, not a range.

With some refactoring, we get:

def _make_range(a_slice):
    start, stop, step = a_slice.start, a_slice.stop, a_slice.step
    if start is None or stop is None:
        raise ValueError('start and stop must be defined to convert to range')
    return range(start, stop, 1 if step is None else step)

def _non_string_iterable(obj):
    try:
        iter(data)
        return not isinstance(obj, str)
    except TypeError:
        return False

class View:
    def __new__(cls, data):
        if _non_string_iterable(data):
            result = object.__new__(cls)
            result._data = data
            return result
        return data

    def __getitem__(self, indices):
        result = self._data
        if isinstance(indices, slice):
            indices = _make_range(indices)
        if _non_string_iterable(indices):
            for i in indices:
                result = result[i]
        else:
            result = result[indices]
        return View(result)
Answered By: Karl Knechtel

Combining collapse() and a Python version of dig(), with special slice handling, reproduces your input table of examples:

from more_itertools import collapse  # or implement this yourself
from unittest.mock import MagicMock


def dig(collection, *keys):
    """Dig into nested subscriptable objects, e.g. dict and list, i.e JSON."""
    curr = collection
    for k in keys:
        if curr is None:
            break

        if not hasattr(curr, '__getitem__') or isinstance(curr, str):
            raise TypeError(f'cannot dig into {type(curr)}')

        try:
            curr = curr[k]
        except (KeyError, IndexError):
            curr = None

    return curr


def what_you_wanted(collection, *keys):  # If I understood you correctly
    slic = keys[0] if len(keys) == 1 and isinstance(keys[0], slice) else None
    dig_keys = range(slic.stop)[slic] if slic else collapse(keys)
    return dig(collection, *dig_keys)


def test_getitem_with(*keys):
    mock = MagicMock()
    mock.__getitem__.returns = mock
    what_you_wanted(mock, *keys)
    print(mock.mock_calls)


test_getitem_with((1, 2, 3))
test_getitem_with(('x', 'y'))
test_getitem_with((99,))
test_getitem_with([[[99]]])
test_getitem_with((1, 2), 3)
test_getitem_with(([1, [2]], [3]))
test_getitem_with(1, 2, 3)
test_getitem_with(slice(3, 20, 8))
test_getitem_with(range(3, 20, 8))

Prints:

[call.__getitem__(1),
 call.__getitem__().__getitem__(2),
 call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__('x'), call.__getitem__().__getitem__('y')]
[call.__getitem__(99)]
[call.__getitem__(99)]
[call.__getitem__(1),
 call.__getitem__().__getitem__(2),
 call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
 call.__getitem__().__getitem__(2),
 call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
 call.__getitem__().__getitem__(2),
 call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(3),
 call.__getitem__().__getitem__(11),
 call.__getitem__().__getitem__().__getitem__(19)]
[call.__getitem__(3),
 call.__getitem__().__getitem__(11),
 call.__getitem__().__getitem__().__getitem__(19)]

For completion, could define a collection object (or View) that implements __getitem__() using what_you_wanted().

Answered By: Kache