Tool to automatically expand YAML merges?
Question:
I’m looking for a tool or process which can easily take a YAML file which contains anchors, aliases and merge keys and expand the aliases and merges out into a flat YAML file. There are still many commonly used YAML parses which don’t fully support merging.
I’d like to be able to take advantage of merging to keep things DRY, but there are instances where this needs to then be built into a more verbose “flat” YAML file so that it can be used by other tooling which relies on incomplete YAML parsers.
Example Source YAML:
default: &DEFAULT
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
development:
<<: *DEFAULT
URL: website.local
mode: dev
test:
<<: *DEFAULT
URL: test.website.qa
mode: test
Desired output YAML:
default:
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
development:
URL: website.local
mode: dev
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
test:
URL: test.website.qa
mode: test
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
Answers:
If you have python installed on your system, you can do pip install ruamel.yaml.cmd
¹ and then:
yaml merge-expand input.yaml output.yaml
(replace output.yaml
with -
to write to stdout). This implements the merge expanding with preservation of key order and comments.
The above is actually a few lines of code that utilizes ruamel.yaml
¹
so if you have Python (2.7 or 3.4+) and install that using pip install ruamel.yaml
and save the following as expand.py
:
import sys
from ruamel.yaml import YAML
yaml = YAML(typ='safe')
yaml.default_flow_style=False
with open(sys.argv[1]) as fp:
data = yaml.load(fp)
with open(sys.argv[2], 'w') as fp:
yaml.dump(data, fp)
you can already do:
python expand.py input.yaml output.yaml
That will get you YAML that is semantically equivalent to what you requested (in output.yaml
the keys of the mappings are sorted, in this programs output they are not).
The above assumes you don’t have any tags in your YAML, nor care about preserving any comments. Most of those, and the key ordering, can be preserved by using a patched version of the standard YAML()
instance. Patching is necessary because the standard YAML()
instance preserves the merges on round-trip as well, which is exactly what you don’t want:
import sys
from ruamel.yaml import YAML, SafeConstructor
yaml = YAML()
yaml.Constructor.flatten_mapping = SafeConstructor.flatten_mapping
yaml.default_flow_style=False
yaml.allow_duplicate_keys = True
# comment out next line if you want "normal" anchors/aliases in your output
yaml.representer.ignore_aliases = lambda x: True
with open(sys.argv[1]) as fp:
data = yaml.load(fp)
with open(sys.argv[2], 'w') as fp:
yaml.dump(data, fp)
with this input:
default: &DEFAULT
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600 # an hour?
development:
<<: *DEFAULT
URL: website.local # local web
mode: dev
test:
<<: *DEFAULT
URL: test.website.qa
mode: test
that will give this output (note that comments on the merged in keys get duplicated):
default:
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600 # an hour?
development:
URL: website.local # local web
mode: dev
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600 # an hour?
test:
URL: test.website.qa
mode: test
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600 # an hour?
The above is what the yaml merge-expand
command, mentioned at the start of this answer, does.
¹ Disclaimer: I am the author of that package.
UPDATE: 2019-03-13 12:41:05
- This answer was modified pursuant to a comment by Anthon which correctly identified limitations with PyYAML. (See Pitfalls infra).
Context
- YAML file
- Python for parsing the YAML
Problem
- User jtYamlEnthusiast wishes to output a non-DRY version of a YAML file with aliases, anchors, and merge keys.
Solution(s)
- Alternative 1: use the
ruamel
library promoted by Anthon infra.
- Alternative 2: use Python
pprint.pformat
and simply do a load/dump round-trip transformation.
Rationale
- the
ruamel
library is great if you have the discretion to install another python library besides pyyaml, and you want a high degree of control over “round-trip” YAML transformations (such as the preservation of YAML comments, for example).
- if you do not need rigorous control over round-tripped YAML, or you are limited for some other reason to pyyaml, you can simply load and dump YAML directly, in order to obtain the “non-DRY” output.
Pitfalls
-
as of this writing PyYAML
has limitations relative to the ruamel
library, regarding the handling of YAML v1.1 and YAML v1.2
-
See also
Example
##
import pprint
import yaml
##
myrawyaml = '''
default: &DEFAULT
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
development:
<<: *DEFAULT
URL: website.local
mode: dev
test:
<<: *DEFAULT
URL: test.website.qa
mode: test
'''
##
pynative = yaml.safe_load(myrawyaml)
vout = pprint.pformat(pynative)
print(vout) ##=> this is non-DRY and just happens to be well-formed YAML syntax
print(yaml.safe_load(vout)) ##=> this proves we have well-formed YAML if it loads without exception
If you for some reason have a use case where you need to write the expanded YAML back to a file as YAML, you can:
-
Use @Anthon’s answer. As noted above, though, this approach might not be feasible if you can’t install packages.
-
Use @dreftymac’s answer. It appears that this answer has worked for some people, but it didn’t work for me; by my understanding, pprint.pformat
returns the argument as a string of its Python representation, and yaml.safe_load
expects the Python representation itself. Of course, you could eval
the string returned by pprint.pformat
, but using eval
on even trusted input feels icky. (Again, the answer has a couple of upvotes so maybe I’m missing something here.)
Alternatively, you can do what I did:
import json
import yaml
def expand_yml(yml):
return yaml.dump(json.loads(json.dumps(yml)))
expand_yml(my_yml_with_aliases)
Since JSON can (with some exceptions, such as aliases) be regarded as a strict subset of YAML, this approach should generally work. However, if performance is a concern, or if you’re dealing with hairier YAML, this approach might not work for you.
I did the expansion of anchors in yaml recently using
yq 'explode(.)' input.yaml > output.yaml
This is using the golang yq.
I’m looking for a tool or process which can easily take a YAML file which contains anchors, aliases and merge keys and expand the aliases and merges out into a flat YAML file. There are still many commonly used YAML parses which don’t fully support merging.
I’d like to be able to take advantage of merging to keep things DRY, but there are instances where this needs to then be built into a more verbose “flat” YAML file so that it can be used by other tooling which relies on incomplete YAML parsers.
Example Source YAML:
default: &DEFAULT
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
development:
<<: *DEFAULT
URL: website.local
mode: dev
test:
<<: *DEFAULT
URL: test.website.qa
mode: test
Desired output YAML:
default:
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
development:
URL: website.local
mode: dev
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
test:
URL: test.website.qa
mode: test
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
If you have python installed on your system, you can do pip install ruamel.yaml.cmd
¹ and then:
yaml merge-expand input.yaml output.yaml
(replace output.yaml
with -
to write to stdout). This implements the merge expanding with preservation of key order and comments.
The above is actually a few lines of code that utilizes ruamel.yaml
¹
so if you have Python (2.7 or 3.4+) and install that using pip install ruamel.yaml
and save the following as expand.py
:
import sys
from ruamel.yaml import YAML
yaml = YAML(typ='safe')
yaml.default_flow_style=False
with open(sys.argv[1]) as fp:
data = yaml.load(fp)
with open(sys.argv[2], 'w') as fp:
yaml.dump(data, fp)
you can already do:
python expand.py input.yaml output.yaml
That will get you YAML that is semantically equivalent to what you requested (in output.yaml
the keys of the mappings are sorted, in this programs output they are not).
The above assumes you don’t have any tags in your YAML, nor care about preserving any comments. Most of those, and the key ordering, can be preserved by using a patched version of the standard YAML()
instance. Patching is necessary because the standard YAML()
instance preserves the merges on round-trip as well, which is exactly what you don’t want:
import sys
from ruamel.yaml import YAML, SafeConstructor
yaml = YAML()
yaml.Constructor.flatten_mapping = SafeConstructor.flatten_mapping
yaml.default_flow_style=False
yaml.allow_duplicate_keys = True
# comment out next line if you want "normal" anchors/aliases in your output
yaml.representer.ignore_aliases = lambda x: True
with open(sys.argv[1]) as fp:
data = yaml.load(fp)
with open(sys.argv[2], 'w') as fp:
yaml.dump(data, fp)
with this input:
default: &DEFAULT
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600 # an hour?
development:
<<: *DEFAULT
URL: website.local # local web
mode: dev
test:
<<: *DEFAULT
URL: test.website.qa
mode: test
that will give this output (note that comments on the merged in keys get duplicated):
default:
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600 # an hour?
development:
URL: website.local # local web
mode: dev
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600 # an hour?
test:
URL: test.website.qa
mode: test
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600 # an hour?
The above is what the yaml merge-expand
command, mentioned at the start of this answer, does.
¹ Disclaimer: I am the author of that package.
UPDATE: 2019-03-13 12:41:05
- This answer was modified pursuant to a comment by Anthon which correctly identified limitations with PyYAML. (See Pitfalls infra).
Context
- YAML file
- Python for parsing the YAML
Problem
- User jtYamlEnthusiast wishes to output a non-DRY version of a YAML file with aliases, anchors, and merge keys.
Solution(s)
- Alternative 1: use the
ruamel
library promoted by Anthon infra. - Alternative 2: use Python
pprint.pformat
and simply do a load/dump round-trip transformation.
Rationale
- the
ruamel
library is great if you have the discretion to install another python library besides pyyaml, and you want a high degree of control over “round-trip” YAML transformations (such as the preservation of YAML comments, for example). - if you do not need rigorous control over round-tripped YAML, or you are limited for some other reason to pyyaml, you can simply load and dump YAML directly, in order to obtain the “non-DRY” output.
Pitfalls
-
as of this writing
PyYAML
has limitations relative to theruamel
library, regarding the handling of YAML v1.1 and YAML v1.2 -
See also
Example
##
import pprint
import yaml
##
myrawyaml = '''
default: &DEFAULT
URL: website.com
mode: production
site_name: Website
some_setting: h2i8yiuhef
some_other_setting: 3600
development:
<<: *DEFAULT
URL: website.local
mode: dev
test:
<<: *DEFAULT
URL: test.website.qa
mode: test
'''
##
pynative = yaml.safe_load(myrawyaml)
vout = pprint.pformat(pynative)
print(vout) ##=> this is non-DRY and just happens to be well-formed YAML syntax
print(yaml.safe_load(vout)) ##=> this proves we have well-formed YAML if it loads without exception
If you for some reason have a use case where you need to write the expanded YAML back to a file as YAML, you can:
-
Use @Anthon’s answer. As noted above, though, this approach might not be feasible if you can’t install packages.
-
Use @dreftymac’s answer. It appears that this answer has worked for some people, but it didn’t work for me; by my understanding,
pprint.pformat
returns the argument as a string of its Python representation, andyaml.safe_load
expects the Python representation itself. Of course, you couldeval
the string returned bypprint.pformat
, but usingeval
on even trusted input feels icky. (Again, the answer has a couple of upvotes so maybe I’m missing something here.)
Alternatively, you can do what I did:
import json
import yaml
def expand_yml(yml):
return yaml.dump(json.loads(json.dumps(yml)))
expand_yml(my_yml_with_aliases)
Since JSON can (with some exceptions, such as aliases) be regarded as a strict subset of YAML, this approach should generally work. However, if performance is a concern, or if you’re dealing with hairier YAML, this approach might not work for you.
I did the expansion of anchors in yaml recently using
yq 'explode(.)' input.yaml > output.yaml
This is using the golang yq.