yaml anchors definitions loading in PyYAML

Question:

I’m using PyYAML.
Is there a way to define a YAML anchor in a way it won’t be a part of the data structure loaded by yaml.load (I can remove “wifi_parm” from the dictionary but looking for a smarter way)?

example.yaml:

wifi_parm: &wifi_params
  ssid: 1
  key: 2
test1:
  name: connectivity
  <<: *wifi_params
test2:
  name: connectivity_5ghz
  <<: *wifi_params

load_example.py:

import yaml
import pprint

with open('aaa.yaml', 'r') as f:
    result = yaml.load(f)
pprint.pprint(result)

prints:

{'test1': {'key': 2, 'name': 'connectivity', 'ssid': 1},
 'test2': {'key': 2, 'name': 'connectivity_5ghz', 'ssid': 1},
 'wifi_parm': {'key': 2, 'ssid': 1}}

I need:

{'test1': {'key': 2, 'name': 'connectivity', 'ssid': 1},
 'test2': {'key': 2, 'name': 'connectivity_5ghz', 'ssid': 1}}
Asked By: adi

||

Answers:

The anchor information in PyYAML is discarded before you get the result from yaml.load(). This is according to the YAML 1.1 specification that PyYAML follows (… anchor names are a serialization detail and are discarded once composing is completed). This has not changed in the YAML 1.2 specification (from 2009). You cannot do this in PyYAML by walking over your result (recursively) and testing what values might be anchors, without extensively modifying the parser.

In my ruamel.yaml (which is YAML 1.2) in round-trip-mode, I preserve the anchors and aliases for anchors that are actually used to alias mappings or sequences (anchors aliases are currently not preserved for scalars, nor are "unused" anchors):

import ruamel.yaml

yaml = ruamel.yaml.YAML()

with open('aaa.yaml') as f:
    result = yaml.load(f)

yaml.dump(result, sys.stdout)

gives:

wifi_parm: &wifi_params
  ssid: 1
  key: 2
test1:
  <<: *wifi_params
  name: connectivity
test2:
  <<: *wifi_params
  name: connectivity_5ghz

and you can actually walk the mapping (or recursively the tree) and find the anchor node and delete it, without knowing the keys name.

import ruamel.yaml
from ruamel.yaml.comments import merge_attrib

yaml = ruamel.yaml.YAML()
with open('aaa.yaml') as f:
    result = yaml.load(f)

keys_to_delete = []
for k in result:
    v = result[k]
    if v.yaml_anchor():
        keys_to_delete.append(k)
    for merge_data in v.merge:  # update the dict with the merge data 
        v.update(merge_data[1])
        delattr(v, merge_attrib)
for k in keys_to_delete:
    del result[k]

yaml.dump(result, sys.stdout)

gives:

test1:
  name: connectivity
  ssid: 1
  key: 2
test2:
  name: connectivity_5ghz
  ssid: 1
  key: 2

doing this generically and recursively (i.e. for anchors and aliases that are anywhere in the tree) is possible as well. The update would be as easy as above, but you would need to keep track of how to delete a key, and this doesn’t have to be a mapping value, it could be a sequence item or a scalar.

Answered By: Anthon

I wanted to do this today too and instead of switching to ruamel.yaml like @Anthon suggests, I found the pyyaml-keep-anchors repository instead, which allowed me to continue using pyyaml. Here’s the example from that repo, which worked out of the box for me.

import yaml
from yaml_keep_anchors.yaml_anchor_parser import AliasResolverYamlLoader

with open('example/example.yaml', 'r') as fh:
    data = yaml.load(fh, Loader=AliasResolverYamlLoader)

assert data['key_three'].anchor_name == 'anchor'
assert data['key_two']['sub_key'].anchor_name == 'anchor_val'

Updated example to show author of ruamel.yaml that scalars can indeed be checked to see if they’re aliases.

Yaml file:

  wifi_parm: &wifi_params
    ssid: 1
    key: &key some_key_here
  test1:
    name: connectivity
    key: *key
  test2:
    name: connectivity_5ghz
    key: *key

Python code:

import yaml
from yaml_keep_anchors.yaml_anchor_parser import AliasResolverYamlLoader

with open('test.yaml', 'r') as f:
    result = yaml.load(f, Loader = AliasResolverYamlLoader)
print(result["test1"]["key"].__dict__)

This prints

{'_wrapped': 'some_key_here', '_anchor': 'key'}

because the referenced key is an alias.

Answered By: Ani

This approach supports any number of anchors. If the anchors section is omitted, it will not cause an error.

example.yaml:

__anchors__:
  wifi_parm: &wifi_params
    ssid: 1
    key: 2
test1:
  name: connectivity
  <<: *wifi_params
test2:
  name: connectivity_5ghz
  <<: *wifi_params

load_example.py:

import yaml
import pprint

with open('example.yaml', 'r') as f:
    result = yaml.safe_load(f)
result.pop('__anchors__', None)
pprint.pprint(result)

prints:

{'test1': {'key': 2, 'name': 'connectivity', 'ssid': 1},
 'test2': {'key': 2, 'name': 'connectivity_5ghz', 'ssid': 1}}
Answered By: Bryan Roach
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.