ruamel.yaml adds incorrect indentation indicator
Question:
We use Python3 (3.10
) and ruamel.yaml (0.17.21
) to run some validation on Kubernetes YAML manifests generated from Helm.
One of them outputs a config that starts with a newline followed by an empty object {}
.
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)
input = """
data:
abc.yaml: |
{}
"""
data = yaml.load(input)
yaml.dump(data, sys.stdout)
The output is as follows which is an invalid YAML document.
data:
abc.yaml: |4
{}
Is it possible to stop ruamel.yaml from adding the indentation indicator?
My current workaround is to not set the indentation with yaml.indent()
as the default indentation of 2 currently matches the indentation of abc.yaml
.
This is not ideal however as we need these indentation settings in other parts of our code.
Answers:
You get the Block Indentation Indicator (the number after the ‘|’), because the first line
is empty. That indicator has to be there if the first line is more indented than the any
of the following lines (i.e. if the Python string you dump starts with one or more spaces).
I have not seen an empty line as a first line for a literal scalar before, but that is not the problem.
What is a problem is that the routine that determines if a hint is necessary takes the sequence
indent level (in your case 4) and returns that as hint. If it does that, it should also indent
the text appropriately, that is two more spaces before the {}
text in the literal scalar, but it doesn’t.
So that is a bug, but you can work around that by providing your own determine_block_hints
routine,
to use 2 instead of the self.best_sequence_indent
(which is set to 4 in your program):
import sys
import ruamel.yaml
yaml_str = """
data:
- abc.yaml: |
{}
- q:
- 42
"""
class MyEmitter(ruamel.yaml.emitter.Emitter):
def determine_block_hints(self, text):
indent = 0
indicator = ''
hints = ''
if text:
if text[0] in ' nx85u2028u2029':
indent = 2 # replaced self.best_sequence_indent
hints += str(indent)
elif self.root_context:
for end in ['n---', 'n...']:
pos = 0
while True:
pos = text.find(end, pos)
if pos == -1:
break
try:
if text[pos + 4] in ' rn':
break
except IndexError:
pass
pos += 1
if pos > -1:
break
if pos > 0:
indent = 2 # replaced self.best_sequence_indent
if text[-1] not in 'nx85u2028u2029':
indicator = '-'
elif len(text) == 1 or text[-2] in 'nx85u2028u2029':
indicator = '+'
hints += indicator
return hints, indent, indicator
yaml = ruamel.yaml.YAML()
yaml.Emitter = MyEmitter
yaml.indent( sequence=4, offset=2)
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives:
data:
- abc.yaml: |2
{}
- q:
- 42
Which is valid YAML.
In your case, the first non-empty line of the literal style block scalar determines the indent, so
the hint is not really necessary. But the python string (in text
) could have a following line which is less indented:
abc: |1
{}
xyz
There the block indentation indicator is necessary.
However you can see that the code does only look at the first character of the first line to determine the hint.
To get rid of the 2
in the output you would need to check that the first visible character occurs at the beginning of text
or directly
after a newline.
We use Python3 (3.10
) and ruamel.yaml (0.17.21
) to run some validation on Kubernetes YAML manifests generated from Helm.
One of them outputs a config that starts with a newline followed by an empty object {}
.
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)
input = """
data:
abc.yaml: |
{}
"""
data = yaml.load(input)
yaml.dump(data, sys.stdout)
The output is as follows which is an invalid YAML document.
data:
abc.yaml: |4
{}
Is it possible to stop ruamel.yaml from adding the indentation indicator?
My current workaround is to not set the indentation with yaml.indent()
as the default indentation of 2 currently matches the indentation of abc.yaml
.
This is not ideal however as we need these indentation settings in other parts of our code.
You get the Block Indentation Indicator (the number after the ‘|’), because the first line
is empty. That indicator has to be there if the first line is more indented than the any
of the following lines (i.e. if the Python string you dump starts with one or more spaces).
I have not seen an empty line as a first line for a literal scalar before, but that is not the problem.
What is a problem is that the routine that determines if a hint is necessary takes the sequence
indent level (in your case 4) and returns that as hint. If it does that, it should also indent
the text appropriately, that is two more spaces before the {}
text in the literal scalar, but it doesn’t.
So that is a bug, but you can work around that by providing your own determine_block_hints
routine,
to use 2 instead of the self.best_sequence_indent
(which is set to 4 in your program):
import sys
import ruamel.yaml
yaml_str = """
data:
- abc.yaml: |
{}
- q:
- 42
"""
class MyEmitter(ruamel.yaml.emitter.Emitter):
def determine_block_hints(self, text):
indent = 0
indicator = ''
hints = ''
if text:
if text[0] in ' nx85u2028u2029':
indent = 2 # replaced self.best_sequence_indent
hints += str(indent)
elif self.root_context:
for end in ['n---', 'n...']:
pos = 0
while True:
pos = text.find(end, pos)
if pos == -1:
break
try:
if text[pos + 4] in ' rn':
break
except IndexError:
pass
pos += 1
if pos > -1:
break
if pos > 0:
indent = 2 # replaced self.best_sequence_indent
if text[-1] not in 'nx85u2028u2029':
indicator = '-'
elif len(text) == 1 or text[-2] in 'nx85u2028u2029':
indicator = '+'
hints += indicator
return hints, indent, indicator
yaml = ruamel.yaml.YAML()
yaml.Emitter = MyEmitter
yaml.indent( sequence=4, offset=2)
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives:
data:
- abc.yaml: |2
{}
- q:
- 42
Which is valid YAML.
In your case, the first non-empty line of the literal style block scalar determines the indent, so
the hint is not really necessary. But the python string (in text
) could have a following line which is less indented:
abc: |1
{}
xyz
There the block indentation indicator is necessary.
However you can see that the code does only look at the first character of the first line to determine the hint.
To get rid of the 2
in the output you would need to check that the first visible character occurs at the beginning of text
or directly
after a newline.