ruamel.yaml adds incorrect indentation indicator

Question:

We use Python3 (3.10) and ruamel.yaml (0.17.21) to run some validation on Kubernetes YAML manifests generated from Helm.

One of them outputs a config that starts with a newline followed by an empty object {}.

import sys
import ruamel.yaml

yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)

input = """
data:
  abc.yaml: |
    
    {}
"""

data = yaml.load(input)
yaml.dump(data, sys.stdout)

The output is as follows which is an invalid YAML document.

data:
  abc.yaml: |4

    {}

Is it possible to stop ruamel.yaml from adding the indentation indicator?

My current workaround is to not set the indentation with yaml.indent() as the default indentation of 2 currently matches the indentation of abc.yaml.

This is not ideal however as we need these indentation settings in other parts of our code.

Asked By: wrdls

||

Answers:

You get the Block Indentation Indicator (the number after the ‘|’), because the first line
is empty. That indicator has to be there if the first line is more indented than the any
of the following lines (i.e. if the Python string you dump starts with one or more spaces).
I have not seen an empty line as a first line for a literal scalar before, but that is not the problem.

What is a problem is that the routine that determines if a hint is necessary takes the sequence
indent level (in your case 4) and returns that as hint. If it does that, it should also indent
the text appropriately, that is two more spaces before the {} text in the literal scalar, but it doesn’t.

So that is a bug, but you can work around that by providing your own determine_block_hints routine,
to use 2 instead of the self.best_sequence_indent (which is set to 4 in your program):

import sys
import ruamel.yaml

yaml_str = """
data:
  - abc.yaml: |
    
      {}
  - q:
      - 42
"""

class MyEmitter(ruamel.yaml.emitter.Emitter):
    def determine_block_hints(self, text):
        indent = 0
        indicator = ''
        hints = ''
        if text:
            if text[0] in ' nx85u2028u2029':
                indent = 2 # replaced self.best_sequence_indent
                hints += str(indent)
            elif self.root_context:
                for end in ['n---', 'n...']:
                    pos = 0
                    while True:
                        pos = text.find(end, pos)
                        if pos == -1:
                            break
                        try:
                            if text[pos + 4] in ' rn':
                                break
                        except IndexError:
                            pass
                        pos += 1
                    if pos > -1:
                        break
                if pos > 0:
                    indent = 2 # replaced self.best_sequence_indent
            if text[-1] not in 'nx85u2028u2029':
                indicator = '-'
            elif len(text) == 1 or text[-2] in 'nx85u2028u2029':
                indicator = '+'
        hints += indicator
        return hints, indent, indicator

yaml = ruamel.yaml.YAML()
yaml.Emitter = MyEmitter

yaml.indent( sequence=4, offset=2)
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

which gives:

data:
  - abc.yaml: |2

      {}
  - q:
      - 42

Which is valid YAML.

In your case, the first non-empty line of the literal style block scalar determines the indent, so
the hint is not really necessary. But the python string (in text) could have a following line which is less indented:

abc: |1

  {}
 xyz

There the block indentation indicator is necessary.

However you can see that the code does only look at the first character of the first line to determine the hint.
To get rid of the 2
in the output you would need to check that the first visible character occurs at the beginning of text or directly
after a newline.

Answered By: Anthon
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.