Combining Dumper class with string representer to get exact required YAML output
Question:
I’m using PyYAML 6.0 with Python 3.9.
In order, I am trying to…
- Create a YAML list
- Embed this list as a multi-line string in another YAML object
- Replace this YAML object in an existing document
- Write the document back, in a format that will pass YAML 1.2 linting
I have the process working, apart from the YAML 1.2 requirement, with the following code:
import yaml
def str_presenter(dumper, data):
"""configures yaml for dumping multiline strings
Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
if data.count('n') > 0: # check for multiline string
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
yaml.representer.SafeRepresenter.add_representer(
str, str_presenter)
class DoYamlStuff:
def post_renderers(images):
return yaml.dump([
{
"op": "replace",
"path": "/spec/postRenderers",
"value": [
{
"kustomize": {
"images": images
}
}
]
}])
@classmethod
def images_patch(cls, chart, images, ecr_url):
return {
"target": {
"kind": "HelmRelease",
"name": chart,
"namespace": chart
},
"patch": cls.post_renderers([x.patch(ecr_url) for x in images])
This produces something like this:
- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx
As you can see, that’s mostly working. Valid YAML, does what it needs to, etc.
Unfortunately… it doesn’t indent the list item by 2 spaces, so the YAML linter in our repository’s pre-commit then adjusts everything. Makes the repo messy, and causes PRs to regularly include changes that aren’t relevant.
I then set out to implement this PrettyDumper class from StackOverflow. This reversed the effects – my indentation is now right, but my scalars aren’t working at all:
- patch: "- op: replacen path: /spec/postRenderersn value:n - kustomize:n
images:n - name: nginx:latestn
newName: 793961818876.dkr.ecr.eu-west-1.amazonaws.com/nginxn
newTag: latestn"
target:
kind: HelmRelease
name: nginx
namespace: nginx
I have tried to merge the str_presenter
function with the PrettyDumper
class, but the scalars still don’t work:
import yaml.emitter
import yaml.serializer
import yaml.representer
import yaml.resolver
class IndentingEmitter(yaml.emitter.Emitter):
def increase_indent(self, flow=False, indentless=False):
"""Ensure that lists items are always indented."""
return super().increase_indent(
flow=False,
indentless=False,
)
class PrettyDumper(
IndentingEmitter,
yaml.serializer.Serializer,
yaml.representer.Representer,
yaml.resolver.Resolver,
):
def __init__(
self,
stream,
default_style=None,
default_flow_style=False,
canonical=None,
indent=None,
width=None,
allow_unicode=None,
line_break=None,
encoding=None,
explicit_start=None,
explicit_end=None,
version=None,
tags=None,
sort_keys=True,
):
IndentingEmitter.__init__(
self,
stream,
canonical=canonical,
indent=indent,
width=width,
allow_unicode=allow_unicode,
line_break=line_break,
)
yaml.serializer.Serializer.__init__(
self,
encoding=encoding,
explicit_start=explicit_start,
explicit_end=explicit_end,
version=version,
tags=tags,
)
yaml.representer.Representer.__init__(
self,
default_style=default_style,
default_flow_style=default_flow_style,
sort_keys=sort_keys,
)
yaml.resolver.Resolver.__init__(self)
yaml.add_representer(str, self.str_presenter)
yaml.representer.SafeRepresenter.add_representer(
str, self.str_presenter)
def str_presenter(self, data):
print(data)
"""configures yaml for dumping multiline strings
Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
if data.count('n') > 0: # check for multiline string
return self.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return self.represent_scalar('tag:yaml.org,2002:str', data)
If I could merge these two approaches into the PrettyDumper
class, I think it would do everything I require. Can anyone point me in the right direction?
Answers:
If you need to pass your output through YAML 1.2 linting, you should not use PyYAML as it only supports (a subset of) YAML 1.1.
ruamel.yaml
can handle more, e.g using a sequence as a mapping key, something that PyYAML cannot handle at all, although it is
valid YAML 1.1. Apart from that it supports, and defaults to,
YAML 1.2 loading/dumping (disclaimer: I am the author of that package).
Over the years ruamel.yaml
‘s round-trip mode, which was originally built to preserve comments,
has been extended and now
handles superfluous quotes, anchor/alias name preservation,
different format string scalars, integers and float etc. You can use its underlying technology
to easily get what you want, without mucking with representers:
import sys
import io
import ruamel.yaml
images = [
dict(name='nginx:latest', newName='12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx', newTag='latest'),
]
chart = 'nginx'
def data_as_literal_scalar(d):
"""dump a data structure d and make it a literal scalar string for further dumping"""
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2) # this indents even the root sequence by 2 extra positions
buf = io.StringIO()
yaml.dump(d, buf)
v = ''.join([x[2:] for x in buf.getvalue().splitlines(True)]) # strip extra positions
return ruamel.yaml.scalarstring.LiteralScalarString(v)
data = [dict(patch=data_as_literal_scalar([{
"op": "replace",
"path": "/spec/postRenderers",
"value": [
{
"kustomize": {
"images": images
}
}
]
}]),
target={
"kind": "HelmRelease",
"name": chart,
"namespace": chart
},
)]
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)
which gives:
- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx
I’m using PyYAML 6.0 with Python 3.9.
In order, I am trying to…
- Create a YAML list
- Embed this list as a multi-line string in another YAML object
- Replace this YAML object in an existing document
- Write the document back, in a format that will pass YAML 1.2 linting
I have the process working, apart from the YAML 1.2 requirement, with the following code:
import yaml
def str_presenter(dumper, data):
"""configures yaml for dumping multiline strings
Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
if data.count('n') > 0: # check for multiline string
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
yaml.representer.SafeRepresenter.add_representer(
str, str_presenter)
class DoYamlStuff:
def post_renderers(images):
return yaml.dump([
{
"op": "replace",
"path": "/spec/postRenderers",
"value": [
{
"kustomize": {
"images": images
}
}
]
}])
@classmethod
def images_patch(cls, chart, images, ecr_url):
return {
"target": {
"kind": "HelmRelease",
"name": chart,
"namespace": chart
},
"patch": cls.post_renderers([x.patch(ecr_url) for x in images])
This produces something like this:
- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx
As you can see, that’s mostly working. Valid YAML, does what it needs to, etc.
Unfortunately… it doesn’t indent the list item by 2 spaces, so the YAML linter in our repository’s pre-commit then adjusts everything. Makes the repo messy, and causes PRs to regularly include changes that aren’t relevant.
I then set out to implement this PrettyDumper class from StackOverflow. This reversed the effects – my indentation is now right, but my scalars aren’t working at all:
- patch: "- op: replacen path: /spec/postRenderersn value:n - kustomize:n
images:n - name: nginx:latestn
newName: 793961818876.dkr.ecr.eu-west-1.amazonaws.com/nginxn
newTag: latestn"
target:
kind: HelmRelease
name: nginx
namespace: nginx
I have tried to merge the str_presenter
function with the PrettyDumper
class, but the scalars still don’t work:
import yaml.emitter
import yaml.serializer
import yaml.representer
import yaml.resolver
class IndentingEmitter(yaml.emitter.Emitter):
def increase_indent(self, flow=False, indentless=False):
"""Ensure that lists items are always indented."""
return super().increase_indent(
flow=False,
indentless=False,
)
class PrettyDumper(
IndentingEmitter,
yaml.serializer.Serializer,
yaml.representer.Representer,
yaml.resolver.Resolver,
):
def __init__(
self,
stream,
default_style=None,
default_flow_style=False,
canonical=None,
indent=None,
width=None,
allow_unicode=None,
line_break=None,
encoding=None,
explicit_start=None,
explicit_end=None,
version=None,
tags=None,
sort_keys=True,
):
IndentingEmitter.__init__(
self,
stream,
canonical=canonical,
indent=indent,
width=width,
allow_unicode=allow_unicode,
line_break=line_break,
)
yaml.serializer.Serializer.__init__(
self,
encoding=encoding,
explicit_start=explicit_start,
explicit_end=explicit_end,
version=version,
tags=tags,
)
yaml.representer.Representer.__init__(
self,
default_style=default_style,
default_flow_style=default_flow_style,
sort_keys=sort_keys,
)
yaml.resolver.Resolver.__init__(self)
yaml.add_representer(str, self.str_presenter)
yaml.representer.SafeRepresenter.add_representer(
str, self.str_presenter)
def str_presenter(self, data):
print(data)
"""configures yaml for dumping multiline strings
Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
if data.count('n') > 0: # check for multiline string
return self.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return self.represent_scalar('tag:yaml.org,2002:str', data)
If I could merge these two approaches into the PrettyDumper
class, I think it would do everything I require. Can anyone point me in the right direction?
If you need to pass your output through YAML 1.2 linting, you should not use PyYAML as it only supports (a subset of) YAML 1.1.
ruamel.yaml
can handle more, e.g using a sequence as a mapping key, something that PyYAML cannot handle at all, although it is
valid YAML 1.1. Apart from that it supports, and defaults to,
YAML 1.2 loading/dumping (disclaimer: I am the author of that package).
Over the years ruamel.yaml
‘s round-trip mode, which was originally built to preserve comments,
has been extended and now
handles superfluous quotes, anchor/alias name preservation,
different format string scalars, integers and float etc. You can use its underlying technology
to easily get what you want, without mucking with representers:
import sys
import io
import ruamel.yaml
images = [
dict(name='nginx:latest', newName='12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx', newTag='latest'),
]
chart = 'nginx'
def data_as_literal_scalar(d):
"""dump a data structure d and make it a literal scalar string for further dumping"""
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2) # this indents even the root sequence by 2 extra positions
buf = io.StringIO()
yaml.dump(d, buf)
v = ''.join([x[2:] for x in buf.getvalue().splitlines(True)]) # strip extra positions
return ruamel.yaml.scalarstring.LiteralScalarString(v)
data = [dict(patch=data_as_literal_scalar([{
"op": "replace",
"path": "/spec/postRenderers",
"value": [
{
"kustomize": {
"images": images
}
}
]
}]),
target={
"kind": "HelmRelease",
"name": chart,
"namespace": chart
},
)]
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)
which gives:
- patch: |
- op: replace
path: /spec/postRenderers
value:
- kustomize:
images:
- name: nginx:latest
newName: 12345678910.dkr.ecr.eu-west-1.amazonaws.com/nginx
newTag: latest
target:
kind: HelmRelease
name: nginx
namespace: nginx