Load YAML as nested objects instead of dictionary in Python
Question:
I have a configuration file in YAML that is currently loaded as a dictionary using yaml.safe_load. For convenience in writing my code, I’d prefer to load it as a set of nested objects. It’s cumbersome to refer to deeper levels of the dictionary and makes the code harder to read.
Example:
import yaml
mydict = yaml.safe_load("""
a: 1
b:
- q: "foo"
r: 99
s: 98
- x: "bar"
y: 97
z: 96
c:
d: 7
e: 8
f: [9,10,11]
""")
Currently, I access items like
mydict["b"][0]["r"]
>>> 99
What I’d like to be able to do is access the same information like
mydict.b[0].r
>>> 99
Is there a way to load YAML as nested objects like this? Or will I have to roll my own class and recursively flip these dictionaries into nested objects? I’m guessing namedtuple could make this a bit easier, but I’d prefer an off-the-shelf solution for the whole thing.
Answers:
If you annotate the root node of the YAML file with a tag, you can define Python classes deriving from YAMLObject
to deal with this as described in the PyYAML documentation.
However, if you prefer your YAML to stay clean from tags, you can construct the nested classes yourself (taken from my answer to a similar question):
import yaml
class BItem:
def __init__(self, q, r, s):
self.q, self.r, self.s = q, r, s
class CItem:
def __init__(self, raw):
self.d, self.e, self.f = raw['d'], raw['e'], raw['f']
class Root:
def __init__(self, raw):
self.a = raw['a']
self.b = [BItem(i['q'], i['r'], i['s']) for i in raw['b']]
self.c = CItem(raw['c'])
mydict = Root(yaml.safe_load("""
a: 1
b:
- q: "foo"
r: 99
s: 98
- q: "bar"
r: 97
s: 96
c:
d: 7
e: 8
f: [9,10,11]
"""))
However, this approach only works if your YAML is structured homogeneously. You gave a heterogeneous structure by having differently named fields in the list of b
(q
, r
, s
in the first item; x
, y
, z
in the second item). I changed the YAML input to have the same field names because with different fields, this approach does not work. I am unsure whether your YAML is actually heterogeneous or you just accidentally made it so for an example. If your YAML actually is heterogeneous, accessing the items via dict access is the only viable way since then, the keys in the YAML file do not correspond to class fields; they are dynamic mapping entries.
This can be done, relatively easily, and without changing the input file.
Since the
dict
PyYAML uses is hard-coded and cannot be patched, you not only have to provide
a dict-like class that behaves as you want, you also have to go through the hoops to make
PyYAML use that class. I.e. change the SafeConstructor
that would normally construct a dict
to use that new class, incorporate that in a new Loader and use PyYAML’s load
to use that Loader:
import sys
import yaml
from yaml.loader import Reader, Scanner, Parser, Composer, SafeConstructor, Resolver
class MyDict(dict):
def __getattr__(self, name):
return self[name]
class MySafeConstructor(SafeConstructor):
def construct_yaml_map(self, node):
data = MyDict()
yield data
value = self.construct_mapping(node)
data.update(value)
MySafeConstructor.add_constructor(
u'tag:yaml.org,2002:map', MySafeConstructor.construct_yaml_map)
class MySafeLoader(Reader, Scanner, Parser, Composer, MySafeConstructor, Resolver):
def __init__(self, stream):
Reader.__init__(self, stream)
Scanner.__init__(self)
Parser.__init__(self)
Composer.__init__(self)
MySafeConstructor.__init__(self)
Resolver.__init__(self)
yaml_str = """
a: 1
b:
- q: "foo"
r: 99
s: 98
- x: "bar"
y: 97
z: 96
c:
d: 7
e: 8
f: [9,10,11]
"""
mydict = yaml.load(yaml_str, Loader=MySafeLoader)
print(mydict.b[0].r)
which gives:
99
If you need to be able to handle YAML1.2 you should use ruamel.yaml
(disclaimer: I am the author of that package) which makes the above slightly simpler
import ruamel.yaml
# same definitions for yaml_str, MyDict
class MySafeConstructor(ruamel.yaml.constructor.SafeConstructor):
def construct_yaml_map(self, node):
data = MyDict()
yield data
value = self.construct_mapping(node)
data.update(value)
MySafeConstructor.add_constructor(
u'tag:yaml.org,2002:map', MySafeConstructor.construct_yaml_map)
yaml = ruamel.yaml.YAML(typ='safe')
yaml.Constructor = MySafeConstructor
mydict = yaml.load(yaml_str)
print(mydict.b[0].r)
which also gives:
99
(and if your real input is large, should load your data noticably faster)
Found a handy library to do exactly what I need:
https://github.com/Infinidat/munch
import yaml
from munch import Munch
mydict = yaml.safe_load("""
a: 1
b:
- q: "foo"
r: 99
s: 98
- x: "bar"
y: 97
z: 96
c:
d: 7
e: 8
f: [9,10,11]
""")
mymunch = Munch(mydict)
(I had to write a simple method to recursively convert all subdicts into munches but now I can navigate my data with e.g.
>>> mymunch.b.q
"foo"
Using a SimpleNamespace
will work at the top level, but won’t translate nested structures.
dct = yaml.safe_load(...)
obj = types.SimpleNamespace(**dct)
To achieve full object-tree translation:
def load_object(dct):
return types.SimpleNamespace(**dct)
dct = yaml.safe_load(...)
obj = json.loads(json.dumps(dct), object_hook=load_object)
I have a configuration file in YAML that is currently loaded as a dictionary using yaml.safe_load. For convenience in writing my code, I’d prefer to load it as a set of nested objects. It’s cumbersome to refer to deeper levels of the dictionary and makes the code harder to read.
Example:
import yaml
mydict = yaml.safe_load("""
a: 1
b:
- q: "foo"
r: 99
s: 98
- x: "bar"
y: 97
z: 96
c:
d: 7
e: 8
f: [9,10,11]
""")
Currently, I access items like
mydict["b"][0]["r"]
>>> 99
What I’d like to be able to do is access the same information like
mydict.b[0].r
>>> 99
Is there a way to load YAML as nested objects like this? Or will I have to roll my own class and recursively flip these dictionaries into nested objects? I’m guessing namedtuple could make this a bit easier, but I’d prefer an off-the-shelf solution for the whole thing.
If you annotate the root node of the YAML file with a tag, you can define Python classes deriving from YAMLObject
to deal with this as described in the PyYAML documentation.
However, if you prefer your YAML to stay clean from tags, you can construct the nested classes yourself (taken from my answer to a similar question):
import yaml
class BItem:
def __init__(self, q, r, s):
self.q, self.r, self.s = q, r, s
class CItem:
def __init__(self, raw):
self.d, self.e, self.f = raw['d'], raw['e'], raw['f']
class Root:
def __init__(self, raw):
self.a = raw['a']
self.b = [BItem(i['q'], i['r'], i['s']) for i in raw['b']]
self.c = CItem(raw['c'])
mydict = Root(yaml.safe_load("""
a: 1
b:
- q: "foo"
r: 99
s: 98
- q: "bar"
r: 97
s: 96
c:
d: 7
e: 8
f: [9,10,11]
"""))
However, this approach only works if your YAML is structured homogeneously. You gave a heterogeneous structure by having differently named fields in the list of b
(q
, r
, s
in the first item; x
, y
, z
in the second item). I changed the YAML input to have the same field names because with different fields, this approach does not work. I am unsure whether your YAML is actually heterogeneous or you just accidentally made it so for an example. If your YAML actually is heterogeneous, accessing the items via dict access is the only viable way since then, the keys in the YAML file do not correspond to class fields; they are dynamic mapping entries.
This can be done, relatively easily, and without changing the input file.
Since the
dict
PyYAML uses is hard-coded and cannot be patched, you not only have to provide
a dict-like class that behaves as you want, you also have to go through the hoops to make
PyYAML use that class. I.e. change the SafeConstructor
that would normally construct a dict
to use that new class, incorporate that in a new Loader and use PyYAML’s load
to use that Loader:
import sys
import yaml
from yaml.loader import Reader, Scanner, Parser, Composer, SafeConstructor, Resolver
class MyDict(dict):
def __getattr__(self, name):
return self[name]
class MySafeConstructor(SafeConstructor):
def construct_yaml_map(self, node):
data = MyDict()
yield data
value = self.construct_mapping(node)
data.update(value)
MySafeConstructor.add_constructor(
u'tag:yaml.org,2002:map', MySafeConstructor.construct_yaml_map)
class MySafeLoader(Reader, Scanner, Parser, Composer, MySafeConstructor, Resolver):
def __init__(self, stream):
Reader.__init__(self, stream)
Scanner.__init__(self)
Parser.__init__(self)
Composer.__init__(self)
MySafeConstructor.__init__(self)
Resolver.__init__(self)
yaml_str = """
a: 1
b:
- q: "foo"
r: 99
s: 98
- x: "bar"
y: 97
z: 96
c:
d: 7
e: 8
f: [9,10,11]
"""
mydict = yaml.load(yaml_str, Loader=MySafeLoader)
print(mydict.b[0].r)
which gives:
99
If you need to be able to handle YAML1.2 you should use ruamel.yaml
(disclaimer: I am the author of that package) which makes the above slightly simpler
import ruamel.yaml
# same definitions for yaml_str, MyDict
class MySafeConstructor(ruamel.yaml.constructor.SafeConstructor):
def construct_yaml_map(self, node):
data = MyDict()
yield data
value = self.construct_mapping(node)
data.update(value)
MySafeConstructor.add_constructor(
u'tag:yaml.org,2002:map', MySafeConstructor.construct_yaml_map)
yaml = ruamel.yaml.YAML(typ='safe')
yaml.Constructor = MySafeConstructor
mydict = yaml.load(yaml_str)
print(mydict.b[0].r)
which also gives:
99
(and if your real input is large, should load your data noticably faster)
Found a handy library to do exactly what I need:
https://github.com/Infinidat/munch
import yaml
from munch import Munch
mydict = yaml.safe_load("""
a: 1
b:
- q: "foo"
r: 99
s: 98
- x: "bar"
y: 97
z: 96
c:
d: 7
e: 8
f: [9,10,11]
""")
mymunch = Munch(mydict)
(I had to write a simple method to recursively convert all subdicts into munches but now I can navigate my data with e.g.
>>> mymunch.b.q
"foo"
Using a SimpleNamespace
will work at the top level, but won’t translate nested structures.
dct = yaml.safe_load(...)
obj = types.SimpleNamespace(**dct)
To achieve full object-tree translation:
def load_object(dct):
return types.SimpleNamespace(**dct)
dct = yaml.safe_load(...)
obj = json.loads(json.dumps(dct), object_hook=load_object)