Nested dictionaries in python with error when accessing non-existent key

Question:

I’m using nested dictionaries as implemented using the AutoVivification class answer at What is the best way to implement nested dictionaries?; namely

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

a = AutoVivification()
a['foo']['bar'] = 'spam'

thus permitting arbitrary nesting in the dictionary. Is there a way to modify the class such that one could assign values to members using an arbitrary set of keys, but only previously defined sets of keys are allowed when trying to access / read from the member? For example,

print a['foo']['bar']
print a['foo']['eggs']

currently outputs

spam
{}

It would be great if the second one gave an error instead, as a[‘foo’][‘eggs’] hasn’t been defined…

Asked By: ChrisW

||

Answers:

I don’t think there is any way to do exactly what you are trying to do, but if you are okay with a slight modification to how you set the keys you can get something pretty similar by just using regular dictionaries.

def nested_dict_set(d, keys, value):
    for k in keys[:-1]:
        d = d.setdefault(k, {})
    d[keys[-1]] = value

a = {}
nested_dict_set(a, ['foo', 'bar'], 'spam')
print a['foo']['bar']
print a['foo']['eggs']  # raises a KeyError
Answered By: Andrew Clark

The problem you’ll run into is that, in order to set an item on a nested dictionary, you must first be able to get all the parent items. For example:

d[1][2][3] = 42

requires getting d[1][2] in order to set d[1][2][3]. There’s no way to know whether an assignment is in process when you are accessing the intermediate dictionaries, so the only way to get assignment to work is to always create the sub-dictionary on access. (You could return some kind of proxy object instead of creating the sub-dictionary, and defer creation of the intermediate dictionaries until assignment, but you’re still not going to get an error when you access a path that doesn’t exist.)

The simplest way around this is to use a single tuple key rather than repeated subkeys. In other words, instead of setting d[1][2][3] you would set d[1, 2, 3]. Assignments then are self-contained operations: they do not require getting any intermediate nesting levels, so you can create the intermediate levels only on assignment.

As a bonus, you may find working with a tuple much more straightforward when passing multiple keys around, since you can just stick ’em in the [] and get the item you want.

You could do this with a single dictionary, using the tuples as keys. However, this loses the hierarchical structure of the data. The implementation below uses sub-dictionaries. A dictionary subclass called node is used so that we can assign an attribute on the dictionary to represent the value of the node at that location; this way, we can store values on the intermediate nodes as well as at the leaves. (It has a __repr__ method that shows both the node’s value and its children, if it has any.) The __setitem__ method of the tupledict class handles creating intermediate nodes when assigning an element. __getitem__ traverses the nodes to find the value you want. (If you want to access the individual nodes as nodes, you can use get() to reach them one at a time.)

class tupledict(dict):

    class node(dict):
        def __repr__(self):
            if self:
                if hasattr(self, "value"):
                    return repr(self.value) + ", " + dict.__repr__(self)
                return dict.__repr__(self)
            else:
                return repr(self.value)

    def __init__(self):
        pass

    def __setitem__(self, key, value):
        if not isinstance(key, tuple):   # handle single value
            key = [key]
        d = self
        for k in key:
            if k not in d:
                dict.__setitem__(d, k, self.node())
            d = dict.__getitem__(d, k)
        d.value = value

    def __getitem__(self, key):
        if not isinstance(key, tuple):
            key = [key]
        d = self
        for k in key:
            try:
                d = dict.__getitem__(d, k)
            except KeyError:
                raise KeyError(key[0] if len(key) == 1 else key)
        try:
            return d.value
        except AttributeError:
            raise KeyError(key[0] if len(key) == 1 else key)

Usage:

td = tupledict()
td['foo', 'bar'] = 'spam'
td['foo', 'eggs']   # KeyError

key = 'foo', 'bar'
td[key]    # 'spam'
Answered By: kindall

Let’s start from dictionaries.

If you try to access a non-existing item you will get a key error

>>> d = {}
>>> d["a"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'a'

Although, you can vivify it by setting an item

>>> d["a"] = 0
>>> d
{"a": 0}

This is to say that – for your use case – you may be better off replicating this pattern. Give the responsibility to vivify your object to item setting, rather than item getting. @kindall and @Anrew Clark solutions are two different takes of this approach, let me propose a third one.

The NestedDict class

from dataclasses import dataclass

@dataclass
class NestedDict:
    ndict: dict
    def __getitem__(self, key):
        if not isinstance(key, tuple):
            key = (key,)
        item = self.ndict
        for k in key:
            item = item[k]
        return item
    def __setitem__(self, key, value):
        if not isinstance(key, tuple):
            key = (key,)
        item = self.ndict
        for k in key[:-1]:
            item = item.setdefault(k, {})
        item[key[-1]] = value

Let’s see how this class behaves. Getting a non-existing item raises an error:

>>> nd = NestedDict({})
>>> nd["a"]
Traceback (most recent call last):
  File "<stdin>", line 11, in __getitem__
KeyError: 'a'

Create new items by item setting

>>> nd["a"] = 0
>>> nd["b", "a"] = 1
>>> nd["b", "b"] = 2
>>> nd
NestedDict(ndict={'a': 0, 'b': {'a': 1, 'b': 2}})

The ndicts package

If you pip install ndicts you get the proper NestedDict class for free.

from ndicts import NestedDict

nd = NestedDict()
nd["a"] = 0
nd["b", "a"] = 1
nd["b", "b"] = 2
>>> nd
NestedDict({'a': 0, 'b': {'a': 1, 'b': 2}})
Answered By: edd313
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.