Resolving internal variables in YAML file

Question:

I have a YAML file which uses keys as references/variables in different sections as in the following example.

download:
  input_data_dir: ./data/input

prepare:
  input_dir: ${download.input_data_dir}
  output_dir: ./data/prepared

process:
  version: 1
  output_dir: ./output/${process.version}

I tried loading the YAML file in Python params = yaml.safe_load(open("../params.yaml")). This outputs ${download.input_data_dir} for params['prepare']['input_dir'], while the expected output is ./data/input. Similarly the expected output for params['process']['output_dir'] is ./output/1.

I wonder how the variables get resolved while loading the YAML file in Python to produce the expected results.

Asked By: raj

||

Answers:

You can define a custom constructor that processes such references, then add an implicit resolver that recognizes the pattern based on a RegEx so that your constructor will be called:

import yaml, sys, re

class RefLoader(yaml.SafeLoader):
    # we override this method to remember the root node,
    # so that we can later resolve paths relative to it
    def get_single_node(self):
        self.cur_root = super(RefLoader, self).get_single_node()
        return self.cur_root

def ref_constructor(loader, node):
    cur = loader.cur_root
    # [2:-1] gets the path inside ${...}
    for item in node.value[2:-1].split("."):
        # cur.value, if it's a mappping, contains a list
        # of (key, value) tuples
        for (key, value) in cur.value:
            # key, if it's a scalar, contains its textual
            # content in key.value
            if key.value == item:
                cur = value
                break
    # defer construction to the default constructor of
    # the referred node
    return loader.construct_object(cur)

# register a custom tag for which our constructor is called
RefLoader.add_constructor("!ref", ref_constructor)

# tell PyYAML that a scalar that looks like `${...}` is to be
# implicitly tagged with `!ref`, so that our custom constructor
# is called.
RefLoader.add_implicit_resolver("!ref", re.compile(r'^${[^}]*}$'), None)

input = """
download:
  input_data_dir: ./data/input

prepare:
  input_dir: ${download.input_data_dir}
  output_dir: ./data/prepared

process:
  version: 1
  output_dir: ./output/${process.version}
"""

data = yaml.load(input, Loader=RefLoader)
yaml.dump(data, sys.stdout)

This yields:

download:
  input_data_dir: ./data/input
prepare:
  input_dir: ./data/input
  output_dir: ./data/prepared
process:
  output_dir: ./output/${process.version}
  version: 1

As you can see, this currently only processes nodes that contain a scalar starting with ${ and ending with }; the value of output_dir isn’t processed. This code serves as example on how to generally process references, you should be able to modify it to fit your specific needs.

Answered By: flyx

I have modified flyx’s answer to get the results I wanted. I am posting my solution in case it is useful for others. It resolves any reference to the keys (used as variables) in an yaml file.

import yaml, sys, re

class RefLoader(yaml.SafeLoader):
    # we override this method to remember the root node,
    def get_single_node(self):
        self.cur_root = super(RefLoader, self).get_single_node()
        return self.cur_root

def ref_constructor(loader, node):
    result = ''
    start = 0
    for match in re.finditer(pattern, node.value):
        end, newstart = match.span()
        result += node.value[start:end]
        matched = match.group(1)

        cur = loader.cur_root # start from the root node
        for item in matched.split("."):
            # cur.value, if it's a mappping, contains a list
            # of (key, value) tuples
            for (key, value) in cur.value:
                # key, if it's a scalar, contains its textual
                # content in key.value
                if key.value == item:
                    cur = value
                    break
        result += cur.value
        start = newstart

    result += node.value[start:]
    return result

# register a custom tag for which our constructor is called
RefLoader.add_constructor("!ref", ref_constructor)

variable_pattern = r'${([^}^{]+)}'
pattern = re.compile(variable_pattern)

# tell PyYAML that a scalar that looks like `${...}` is to be implicitly tagged with `!ref`,
# so that our custom constructor is called.
RefLoader.add_implicit_resolver("!ref", re.compile(r'.*'+ variable_pattern + '.*'), None)


input = """
download:
  input_data_dir: ./data/input

prepare:
  input_dir: ${download.input_data_dir}
  output_dir: ./data/prepared

process:
  version: 1
  output_dir: ./output/${process.version}
"""

data = yaml.load(input, Loader=RefLoader)
yaml.dump(data, sys.stdout)

This now yields as I wanted:

download:
  input_data_dir: ./data/input
prepare:
  input_dir: ./data/input
  output_dir: ./data/prepared
process:
  output_dir: ./output/1
  version: 1
Answered By: raj
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.