Python: Easily access deeply nested dict (get and set)
Question:
I’m building some Python code to read and manipulate deeply nested dicts (ultimately for interacting with JSON services, however it would be great to have for other purposes) I’m looking for a way to easily read/set/update values deep within the dict, without needing a lot of code.
@see also Python: Recursively access dict via attributes as well as index access? — Curt Hagenlocher’s “DotDictify” solution is pretty eloquent. I also like what Ben Alman presents for JavaScript in http://benalman.com/projects/jquery-getobject-plugin/ It would be great to somehow combine the two.
Building off of Curt Hagenlocher and Ben Alman’s examples, it would be great in Python to have a capability like:
>>> my_obj = DotDictify()
>>> my_obj.a.b.c = {'d':1, 'e':2}
>>> print my_obj
{'a': {'b': {'c': {'d': 1, 'e': 2}}}}
>>> print my_obj.a.b.c.d
1
>>> print my_obj.a.b.c.x
None
>>> print my_obj.a.b.c.d.x
None
>>> print my_obj.a.b.c.d.x.y.z
None
Any idea if this is possible, and if so, how to go about modifying the DotDictify solution?
Alternatively, the get method could be made to accept a dot notation (and a complementary set method added) however the object notation sure is cleaner.
>>> my_obj = DotDictify()
>>> my_obj.set('a.b.c', {'d':1, 'e':2})
>>> print my_obj
{'a': {'b': {'c': {'d': 1, 'e': 2}}}}
>>> print my_obj.get('a.b.c.d')
1
>>> print my_obj.get('a.b.c.x')
None
>>> print my_obj.get('a.b.c.d.x')
None
>>> print my_obj.get('a.b.c.d.x.y.z')
None
This type of interaction would be great to have for dealing with deeply nested dicts. Does anybody know another strategy (or sample code snippet/library) to try?
Answers:
I had used something similar in order to build somithing similar Trie for an application. I hope it helps.
class Trie:
"""
A Trie is like a dictionary in that it maps keys to values.
However, because of the way keys are stored, it allows
look up based on the longest prefix that matches.
"""
def __init__(self):
# Every node consists of a list with two position. In
# the first one,there is the value while on the second
# one a dictionary which leads to the rest of the nodes.
self.root = [0, {}]
def insert(self, key):
"""
Add the given value for the given key.
>>> a = Trie()
>>> a.insert('kalo')
>>> print(a)
[0, {'k': [1, {'a': [1, {'l': [1, {'o': [1, {}]}]}]}]}]
>>> a.insert('kalo')
>>> print(a)
[0, {'k': [2, {'a': [2, {'l': [2, {'o': [2, {}]}]}]}]}]
>>> b = Trie()
>>> b.insert('heh')
>>> b.insert('ha')
>>> print(b)
[0, {'h': [2, {'a': [1, {}], 'e': [1, {'h': [1, {}]}]}]}]
"""
# find the node to append the new value.
curr_node = self.root
for k in key:
curr_node = curr_node[1].setdefault(k, [0, {}])
curr_node[0] += 1
def find(self, key):
"""
Return the value for the given key or None if key not
found.
>>> a = Trie()
>>> a.insert('ha')
>>> a.insert('ha')
>>> a.insert('he')
>>> a.insert('ho')
>>> print(a.find('h'))
4
>>> print(a.find('ha'))
2
>>> print(a.find('he'))
1
"""
curr_node = self.root
for k in key:
try:
curr_node = curr_node[1][k]
except KeyError:
return 0
return curr_node[0]
def __str__(self):
return str(self.root)
def __getitem__(self, key):
curr_node = self.root
for k in key:
try:
curr_node = curr_node[1][k]
except KeyError:
yield None
for k in curr_node[1]:
yield k, curr_node[1][k][0]
if __name__ == '__main__':
a = Trie()
a.insert('kalo')
a.insert('kala')
a.insert('kal')
a.insert('kata')
print(a.find('kala'))
for b in a['ka']:
print(b)
print(a)
Attribute Tree
The problem with your first specification is that Python can’t tell in __getitem__
if, at my_obj.a.b.c.d
, you will next proceed farther down a nonexistent tree, in which case it needs to return an object with a __getitem__
method so you won’t get an AttributeError
thrown at you, or if you want a value, in which case it needs to return None
.
I would argue that in every case you have above, you should expect it to throw a KeyError
instead of returning None
. The reason being that you can’t tell if None
means “no key” or “someone actually stored None
at that location”. For this behavior, all you have to do is take dotdictify
, remove marker
, and replace __getitem__
with:
def __getitem__(self, key):
return self[key]
Because what you really want is a dict
with __getattr__
and __setattr__
.
There may be a way to remove __getitem__
entirely and say something like __getattr__ = dict.__getitem__
, but I think this may be over-optimization, and will be a problem if you later decide you want __getitem__
to create the tree as it goes like dotdictify
originally does, in which case you would change it to:
def __getitem__(self, key):
if key not in self:
dict.__setitem__(self, key, dotdictify())
return dict.__getitem__(self, key)
I don’t like the marker
business in the original dotdictify
.
Path Support
The second specification (override get()
and set()
) is that a normal dict
has a get()
that operates differently from what you describe and doesn’t even have a set
(though it has a setdefault()
which is an inverse operation to get()
). People expect get
to take two parameters, the second being a default if the key isn’t found.
If you want to extend __getitem__
and __setitem__
to handle dotted-key notation, you’ll need to modify doctictify
to:
class dotdictify(dict):
def __init__(self, value=None):
if value is None:
pass
elif isinstance(value, dict):
for key in value:
self.__setitem__(key, value[key])
else:
raise TypeError, 'expected dict'
def __setitem__(self, key, value):
if '.' in key:
myKey, restOfKey = key.split('.', 1)
target = self.setdefault(myKey, dotdictify())
if not isinstance(target, dotdictify):
raise KeyError, 'cannot set "%s" in "%s" (%s)' % (restOfKey, myKey, repr(target))
target[restOfKey] = value
else:
if isinstance(value, dict) and not isinstance(value, dotdictify):
value = dotdictify(value)
dict.__setitem__(self, key, value)
def __getitem__(self, key):
if '.' not in key:
return dict.__getitem__(self, key)
myKey, restOfKey = key.split('.', 1)
target = dict.__getitem__(self, myKey)
if not isinstance(target, dotdictify):
raise KeyError, 'cannot get "%s" in "%s" (%s)' % (restOfKey, myKey, repr(target))
return target[restOfKey]
def __contains__(self, key):
if '.' not in key:
return dict.__contains__(self, key)
myKey, restOfKey = key.split('.', 1)
target = dict.__getitem__(self, myKey)
if not isinstance(target, dotdictify):
return False
return restOfKey in target
def setdefault(self, key, default):
if key not in self:
self[key] = default
return self[key]
__setattr__ = __setitem__
__getattr__ = __getitem__
Test code:
>>> life = dotdictify({'bigBang': {'stars': {'planets': {}}}})
>>> life.bigBang.stars.planets
{}
>>> life.bigBang.stars.planets.earth = { 'singleCellLife' : {} }
>>> life.bigBang.stars.planets
{'earth': {'singleCellLife': {}}}
>>> life['bigBang.stars.planets.mars.landers.vikings'] = 2
>>> life.bigBang.stars.planets.mars.landers.vikings
2
>>> 'landers.vikings' in life.bigBang.stars.planets.mars
True
>>> life.get('bigBang.stars.planets.mars.landers.spirit', True)
True
>>> life.setdefault('bigBang.stars.planets.mars.landers.opportunity', True)
True
>>> 'landers.opportunity' in life.bigBang.stars.planets.mars
True
>>> life.bigBang.stars.planets.mars
{'landers': {'opportunity': True, 'vikings': 2}}
The older answers have some pretty good tips in them, but they all require replacing standard Python data structures (dicts, etc.) with custom ones, and would not work with keys that are not valid attribute names.
These days we can do better, using a pure-Python, Python 2/3-compatible library, built for exactly this purpose, called glom. Using your example:
import glom
target = {} # a plain dictionary we will deeply set on
glom.assign(target, 'a.b.c', {'d': 1, 'e': 2}, missing=dict)
# {'a': {'b': {'c': {'e': 2, 'd': 1}}}}
Notice the missing=dict
, used to autocreate dictionaries. We can easily get the value back using glom’s deep-get:
glom.glom(target, 'a.b.c.d')
# 1
There’s a lot more you can do with glom, especially around deep getting and setting. I should know, since (full disclosure) I created it. That means if you find a gap, you should let me know!
Not a full-fledged solution, but a simple approach with no dependencies, and which doesn’t require replacing/modifying the built-in dictionary type. Might fit the bill for some:
def get(nested_dict: dict, key: str):
return reduce(lambda d, k: d[k], key.split('.'), nested_dict)
my_dict = {'a': {'b': {'c': 123}}}
get(my_dict, "a.b.c") # 123
The setter is not quite as nice, but works:
def set(nested_dict: dict, key: str, value):
*keys, last_key = key.split('.')
for k in keys:
if k not in nested_dict:
nested_dict[k] = dict()
nested_dict = nested_dict[k]
nested_dict[last_key] = value
set(my_dict, "very.very.many.levels", True)
A more full-fledged solution should probably check the keys accessed along the way. Probably other stuff I haven’t though about at the moment.
I’m building some Python code to read and manipulate deeply nested dicts (ultimately for interacting with JSON services, however it would be great to have for other purposes) I’m looking for a way to easily read/set/update values deep within the dict, without needing a lot of code.
@see also Python: Recursively access dict via attributes as well as index access? — Curt Hagenlocher’s “DotDictify” solution is pretty eloquent. I also like what Ben Alman presents for JavaScript in http://benalman.com/projects/jquery-getobject-plugin/ It would be great to somehow combine the two.
Building off of Curt Hagenlocher and Ben Alman’s examples, it would be great in Python to have a capability like:
>>> my_obj = DotDictify()
>>> my_obj.a.b.c = {'d':1, 'e':2}
>>> print my_obj
{'a': {'b': {'c': {'d': 1, 'e': 2}}}}
>>> print my_obj.a.b.c.d
1
>>> print my_obj.a.b.c.x
None
>>> print my_obj.a.b.c.d.x
None
>>> print my_obj.a.b.c.d.x.y.z
None
Any idea if this is possible, and if so, how to go about modifying the DotDictify solution?
Alternatively, the get method could be made to accept a dot notation (and a complementary set method added) however the object notation sure is cleaner.
>>> my_obj = DotDictify()
>>> my_obj.set('a.b.c', {'d':1, 'e':2})
>>> print my_obj
{'a': {'b': {'c': {'d': 1, 'e': 2}}}}
>>> print my_obj.get('a.b.c.d')
1
>>> print my_obj.get('a.b.c.x')
None
>>> print my_obj.get('a.b.c.d.x')
None
>>> print my_obj.get('a.b.c.d.x.y.z')
None
This type of interaction would be great to have for dealing with deeply nested dicts. Does anybody know another strategy (or sample code snippet/library) to try?
I had used something similar in order to build somithing similar Trie for an application. I hope it helps.
class Trie:
"""
A Trie is like a dictionary in that it maps keys to values.
However, because of the way keys are stored, it allows
look up based on the longest prefix that matches.
"""
def __init__(self):
# Every node consists of a list with two position. In
# the first one,there is the value while on the second
# one a dictionary which leads to the rest of the nodes.
self.root = [0, {}]
def insert(self, key):
"""
Add the given value for the given key.
>>> a = Trie()
>>> a.insert('kalo')
>>> print(a)
[0, {'k': [1, {'a': [1, {'l': [1, {'o': [1, {}]}]}]}]}]
>>> a.insert('kalo')
>>> print(a)
[0, {'k': [2, {'a': [2, {'l': [2, {'o': [2, {}]}]}]}]}]
>>> b = Trie()
>>> b.insert('heh')
>>> b.insert('ha')
>>> print(b)
[0, {'h': [2, {'a': [1, {}], 'e': [1, {'h': [1, {}]}]}]}]
"""
# find the node to append the new value.
curr_node = self.root
for k in key:
curr_node = curr_node[1].setdefault(k, [0, {}])
curr_node[0] += 1
def find(self, key):
"""
Return the value for the given key or None if key not
found.
>>> a = Trie()
>>> a.insert('ha')
>>> a.insert('ha')
>>> a.insert('he')
>>> a.insert('ho')
>>> print(a.find('h'))
4
>>> print(a.find('ha'))
2
>>> print(a.find('he'))
1
"""
curr_node = self.root
for k in key:
try:
curr_node = curr_node[1][k]
except KeyError:
return 0
return curr_node[0]
def __str__(self):
return str(self.root)
def __getitem__(self, key):
curr_node = self.root
for k in key:
try:
curr_node = curr_node[1][k]
except KeyError:
yield None
for k in curr_node[1]:
yield k, curr_node[1][k][0]
if __name__ == '__main__':
a = Trie()
a.insert('kalo')
a.insert('kala')
a.insert('kal')
a.insert('kata')
print(a.find('kala'))
for b in a['ka']:
print(b)
print(a)
Attribute Tree
The problem with your first specification is that Python can’t tell in __getitem__
if, at my_obj.a.b.c.d
, you will next proceed farther down a nonexistent tree, in which case it needs to return an object with a __getitem__
method so you won’t get an AttributeError
thrown at you, or if you want a value, in which case it needs to return None
.
I would argue that in every case you have above, you should expect it to throw a KeyError
instead of returning None
. The reason being that you can’t tell if None
means “no key” or “someone actually stored None
at that location”. For this behavior, all you have to do is take dotdictify
, remove marker
, and replace __getitem__
with:
def __getitem__(self, key):
return self[key]
Because what you really want is a dict
with __getattr__
and __setattr__
.
There may be a way to remove __getitem__
entirely and say something like __getattr__ = dict.__getitem__
, but I think this may be over-optimization, and will be a problem if you later decide you want __getitem__
to create the tree as it goes like dotdictify
originally does, in which case you would change it to:
def __getitem__(self, key):
if key not in self:
dict.__setitem__(self, key, dotdictify())
return dict.__getitem__(self, key)
I don’t like the marker
business in the original dotdictify
.
Path Support
The second specification (override get()
and set()
) is that a normal dict
has a get()
that operates differently from what you describe and doesn’t even have a set
(though it has a setdefault()
which is an inverse operation to get()
). People expect get
to take two parameters, the second being a default if the key isn’t found.
If you want to extend __getitem__
and __setitem__
to handle dotted-key notation, you’ll need to modify doctictify
to:
class dotdictify(dict):
def __init__(self, value=None):
if value is None:
pass
elif isinstance(value, dict):
for key in value:
self.__setitem__(key, value[key])
else:
raise TypeError, 'expected dict'
def __setitem__(self, key, value):
if '.' in key:
myKey, restOfKey = key.split('.', 1)
target = self.setdefault(myKey, dotdictify())
if not isinstance(target, dotdictify):
raise KeyError, 'cannot set "%s" in "%s" (%s)' % (restOfKey, myKey, repr(target))
target[restOfKey] = value
else:
if isinstance(value, dict) and not isinstance(value, dotdictify):
value = dotdictify(value)
dict.__setitem__(self, key, value)
def __getitem__(self, key):
if '.' not in key:
return dict.__getitem__(self, key)
myKey, restOfKey = key.split('.', 1)
target = dict.__getitem__(self, myKey)
if not isinstance(target, dotdictify):
raise KeyError, 'cannot get "%s" in "%s" (%s)' % (restOfKey, myKey, repr(target))
return target[restOfKey]
def __contains__(self, key):
if '.' not in key:
return dict.__contains__(self, key)
myKey, restOfKey = key.split('.', 1)
target = dict.__getitem__(self, myKey)
if not isinstance(target, dotdictify):
return False
return restOfKey in target
def setdefault(self, key, default):
if key not in self:
self[key] = default
return self[key]
__setattr__ = __setitem__
__getattr__ = __getitem__
Test code:
>>> life = dotdictify({'bigBang': {'stars': {'planets': {}}}})
>>> life.bigBang.stars.planets
{}
>>> life.bigBang.stars.planets.earth = { 'singleCellLife' : {} }
>>> life.bigBang.stars.planets
{'earth': {'singleCellLife': {}}}
>>> life['bigBang.stars.planets.mars.landers.vikings'] = 2
>>> life.bigBang.stars.planets.mars.landers.vikings
2
>>> 'landers.vikings' in life.bigBang.stars.planets.mars
True
>>> life.get('bigBang.stars.planets.mars.landers.spirit', True)
True
>>> life.setdefault('bigBang.stars.planets.mars.landers.opportunity', True)
True
>>> 'landers.opportunity' in life.bigBang.stars.planets.mars
True
>>> life.bigBang.stars.planets.mars
{'landers': {'opportunity': True, 'vikings': 2}}
The older answers have some pretty good tips in them, but they all require replacing standard Python data structures (dicts, etc.) with custom ones, and would not work with keys that are not valid attribute names.
These days we can do better, using a pure-Python, Python 2/3-compatible library, built for exactly this purpose, called glom. Using your example:
import glom
target = {} # a plain dictionary we will deeply set on
glom.assign(target, 'a.b.c', {'d': 1, 'e': 2}, missing=dict)
# {'a': {'b': {'c': {'e': 2, 'd': 1}}}}
Notice the missing=dict
, used to autocreate dictionaries. We can easily get the value back using glom’s deep-get:
glom.glom(target, 'a.b.c.d')
# 1
There’s a lot more you can do with glom, especially around deep getting and setting. I should know, since (full disclosure) I created it. That means if you find a gap, you should let me know!
Not a full-fledged solution, but a simple approach with no dependencies, and which doesn’t require replacing/modifying the built-in dictionary type. Might fit the bill for some:
def get(nested_dict: dict, key: str):
return reduce(lambda d, k: d[k], key.split('.'), nested_dict)
my_dict = {'a': {'b': {'c': 123}}}
get(my_dict, "a.b.c") # 123
The setter is not quite as nice, but works:
def set(nested_dict: dict, key: str, value):
*keys, last_key = key.split('.')
for k in keys:
if k not in nested_dict:
nested_dict[k] = dict()
nested_dict = nested_dict[k]
nested_dict[last_key] = value
set(my_dict, "very.very.many.levels", True)
A more full-fledged solution should probably check the keys accessed along the way. Probably other stuff I haven’t though about at the moment.