Load YAML as nested objects instead of dictionary in Python

Question:

I have a configuration file in YAML that is currently loaded as a dictionary using yaml.safe_load. For convenience in writing my code, I’d prefer to load it as a set of nested objects. It’s cumbersome to refer to deeper levels of the dictionary and makes the code harder to read.

Example:

import yaml
mydict = yaml.safe_load("""
a: 1
b:
- q: "foo"
  r: 99
  s: 98
- x: "bar"
  y: 97
  z: 96
c:
  d: 7
  e: 8
  f: [9,10,11]
""")

Currently, I access items like

mydict["b"][0]["r"]
>>> 99

What I’d like to be able to do is access the same information like

mydict.b[0].r
>>> 99

Is there a way to load YAML as nested objects like this? Or will I have to roll my own class and recursively flip these dictionaries into nested objects? I’m guessing namedtuple could make this a bit easier, but I’d prefer an off-the-shelf solution for the whole thing.

Asked By: spencerrecneps

||

Answers:

If you annotate the root node of the YAML file with a tag, you can define Python classes deriving from YAMLObject to deal with this as described in the PyYAML documentation.

However, if you prefer your YAML to stay clean from tags, you can construct the nested classes yourself (taken from my answer to a similar question):

import yaml

class BItem:
    def __init__(self, q, r, s):
        self.q, self.r, self.s = q, r, s

class CItem:
    def __init__(self, raw):
        self.d, self.e, self.f = raw['d'], raw['e'], raw['f']

class Root:
    def __init__(self, raw):
        self.a = raw['a']
        self.b = [BItem(i['q'], i['r'], i['s']) for i in raw['b']]
        self.c = CItem(raw['c'])

mydict = Root(yaml.safe_load("""
a: 1
b:
- q: "foo"
  r: 99
  s: 98
- q: "bar"
  r: 97
  s: 96
c:
  d: 7
  e: 8
  f: [9,10,11]
"""))

However, this approach only works if your YAML is structured homogeneously. You gave a heterogeneous structure by having differently named fields in the list of b (q, r, s in the first item; x, y, z in the second item). I changed the YAML input to have the same field names because with different fields, this approach does not work. I am unsure whether your YAML is actually heterogeneous or you just accidentally made it so for an example. If your YAML actually is heterogeneous, accessing the items via dict access is the only viable way since then, the keys in the YAML file do not correspond to class fields; they are dynamic mapping entries.

Answered By: flyx

This can be done, relatively easily, and without changing the input file.

Since the
dict PyYAML uses is hard-coded and cannot be patched, you not only have to provide
a dict-like class that behaves as you want, you also have to go through the hoops to make
PyYAML use that class. I.e. change the SafeConstructor that would normally construct a dict
to use that new class, incorporate that in a new Loader and use PyYAML’s load to use that Loader:

import sys
import yaml

from yaml.loader import Reader, Scanner, Parser, Composer, SafeConstructor, Resolver

class MyDict(dict):
   def __getattr__(self, name):
       return self[name]

class MySafeConstructor(SafeConstructor):
   def construct_yaml_map(self, node):
       data = MyDict()
       yield data
       value = self.construct_mapping(node)
       data.update(value)

MySafeConstructor.add_constructor(
  u'tag:yaml.org,2002:map', MySafeConstructor.construct_yaml_map)


class MySafeLoader(Reader, Scanner, Parser, Composer, MySafeConstructor, Resolver):
    def __init__(self, stream):
        Reader.__init__(self, stream)
        Scanner.__init__(self)
        Parser.__init__(self)
        Composer.__init__(self)
        MySafeConstructor.__init__(self)
        Resolver.__init__(self)


yaml_str = """
a: 1
b:
- q: "foo"
  r: 99
  s: 98
- x: "bar"
  y: 97
  z: 96
c:
  d: 7
  e: 8
  f: [9,10,11]
"""

mydict = yaml.load(yaml_str, Loader=MySafeLoader)

print(mydict.b[0].r)

which gives:

99

If you need to be able to handle YAML1.2 you should use ruamel.yaml
(disclaimer: I am the author of that package) which makes the above slightly simpler

import ruamel.yaml

# same definitions for yaml_str, MyDict

class MySafeConstructor(ruamel.yaml.constructor.SafeConstructor):
   def construct_yaml_map(self, node):
       data = MyDict()
       yield data
       value = self.construct_mapping(node)
       data.update(value)

MySafeConstructor.add_constructor(
  u'tag:yaml.org,2002:map', MySafeConstructor.construct_yaml_map)


yaml = ruamel.yaml.YAML(typ='safe')
yaml.Constructor = MySafeConstructor
mydict = yaml.load(yaml_str)

print(mydict.b[0].r)

which also gives:

99

(and if your real input is large, should load your data noticably faster)

Answered By: Anthon

Found a handy library to do exactly what I need:
https://github.com/Infinidat/munch

import yaml
from munch import Munch
mydict = yaml.safe_load("""
a: 1
b:
- q: "foo"
  r: 99
  s: 98
- x: "bar"
  y: 97
  z: 96
c:
  d: 7
  e: 8
  f: [9,10,11]
""")
mymunch = Munch(mydict)

(I had to write a simple method to recursively convert all subdicts into munches but now I can navigate my data with e.g.

>>> mymunch.b.q
"foo"
Answered By: spencerrecneps

Using a SimpleNamespace will work at the top level, but won’t translate nested structures.

dct = yaml.safe_load(...)
obj = types.SimpleNamespace(**dct)

To achieve full object-tree translation:

def load_object(dct):
    return types.SimpleNamespace(**dct)

dct = yaml.safe_load(...)
obj = json.loads(json.dumps(dct), object_hook=load_object)
Answered By: ds77
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.