ruamel.yaml: pin comment to next data item instead of previous one

Question:

hI observed a somewhat confusing behavior when using ruamel.yaml with roundtrip loader.
It’s probably because it’s not trivial for ruamel.yaml to automatically determine to which data item a comment should be connected.

In the below example, I would like to always keep the comment. It should be possible if I told ruamel.yaml that it should consider all comments connected to the next data item (i.e. the comment precedes "other").

Can this be done?

If yes: How?

data_was_a_dict = """
---
data: was_a_dict
main_dict:
  data: 
    some: data

# this comment gets always lost
other: data
"""

data_was_a_str = """
---
data: was_a_str
main_dict:
  data: a_string  

# this gets shifted or stays
other: data
"""

import ruamel.yaml, sys


yaml = ruamel.yaml.YAML()
for text in [data_was_a_dict, data_was_a_str]:
    for new_data in ["new_text", {"something": "else"}]:
        data = yaml.load(text)

        data["main_dict"]["data"] = new_data
        yaml.dump(data, sys.stdout)
        print("==========================")

Output:

data: was_a_dict
main_dict:
  data: new_text
other: data
==========================
data: was_a_dict
main_dict:
  data:
    something: else
other: data
==========================
data: was_a_str
main_dict:
  data: new_text

# this gets shifted or stays
other: data
==========================
data: was_a_str
main_dict:
  data:

# this gets shifted or stays
    something: else
other: data
==========================

========================================

Update thanks to Anthon:

def replace_dict(target, key, new_dict):
    def get_last_key(dct):
        keys = [key for key in dct.keys()]
        return keys[-1]
        
    old_dict = target[key]
    if old_dict and new_dict:
        # if new_dict is empty, we will lose the comment. 
        # That's fine for now since this should not happen in my case and I don't know yet where to attach 
        # the comment in that case
        last_comment = old_dict.ca.items.get(get_last_key(old_dict), None)
        if last_comment:
            actual_comment = last_comment[2]
            actual_comment.value = clean_comment(actual_comment.value)
            if actual_comment.value:
                if not isinstance(new_dict, ruamel.yaml.comments.CommentedMap):
                    new_dict = ruamel.yaml.comments.CommentedMap(new_dict)                
                new_dict.ca.items[get_last_key(new_dict)] = last_comment
    target[key] = new_dict
    
def clean_comment(txt: str) -> str:
    _,_,after = txt.partition("n")
    if after:
        return "n" + after
    return ""

data_was_a_dict = """
---
main_dict:
  place: holder
  sub_dict: # this stays
    item1: value1
    item2: value2 # this is removed
    
# this stays    
other: data
"""

import ruamel.yaml, sys
import json

yaml = ruamel.yaml.YAML()

data = yaml.load(data_was_a_dict)
replace_dict(data["main_dict"], "sub_dict", {"item_new": "value_new"})

yaml.dump(data, sys.stdout)

gives

main_dict:
  place: holder
  sub_dict: # this stays
    item_new: value_new
# this stays    
other: data
Asked By: matthias

||

Answers:

I am not sure what is confusing about the following, documented behaviour on preservations of comments:

This preservation is normally not broken unless you severely alter the structure of a component
(delete a key in a dict, remove list entries). Reassigning values or replacing list items, etc., is fine.

In three of the four combinations that you dump you have first either replaced a
simple value by a composite value, or else removed the composite value that contains
the comment information altogether.

In all versions up to the current (i.e. <0.18), ruamel.yaml attaches a scanned
comment to a token existing at the time of parsing of the comment. There is no
token (yet) for your next data item, so there is currently no way to attach this
to "the next data item". The actual comment information in ruamel.yaml<0.18 is
an extended end-of-line comment with a value something like "nn# this gets shifted or stays", as it starts with a newline, this means there is no actual
comment at the end of the line of the key it is associated with..

In your data_was_a_dict the comment associated with the key some and whether you replace
the CommentedMap (a dict subtype, with comments on
its .ca attribute) with a string or a dict doesn’t make a difference as the data structure with the comment is completely replaced.

In your data_was_a_str YAML document it is associated with with the key data
on a "CommentedMap on level higher than in the other document". If you replace
its value with another string the output will be similar to the input. If you
add a whole new substructure, the comment is interpreted as becoming between the
key and its (composite) value.

To get what you seem to expect, you have to check that there is a comment associated with the
key data and move that to be associated with the key something, which could
not be a key on a normal dict (it would have to be a CommentedMap).
In the combination where you delete/overwrite the data structure on which the comment is attached, you would have to check for a comment and move it before deletion. In combination where you replace the simple value with a composite one, you could move the comment after the assigment (given a suitable composite like CommentedMap). So yes, what you want is possible, but not trivial and these would be relying on undocumented features that will change in upcoming versions.

import sys
import ruamel.yaml

data_was_a_dict = """
data: was_a_dict
main_dict:
  data: 
    some: data

# this comment gets always lost
other: data
"""

data_was_a_str = """
data: was_a_str
main_dict:
  data: a_string  

# this gets shifted or stays
other: data
"""

yaml = ruamel.yaml.YAML()

data = yaml.load(data_was_a_dict)
print(data['main_dict']['data'].ca)
data = yaml.load(data_was_a_str)
print(data['main_dict'].ca)

which gives:

Comment(comment=None,
  items={'some': [None, None, CommentToken('nn# this comment gets always lostn', line: 5, col: 0), None]})
Comment(comment=None,
  items={'data': [None, None, CommentToken('nn# this gets shifted or staysn', line: 4, col: 0), None]})

I am looking into replacing the comment scanning and attachment for ruamel.yaml that will allow
the user to optionally split (on the first empty line), allow the user to
specify to attach comment info to the prevous and/or following "data".
Assignment potentially could be influenced by indent level. The documentation
might even be updated to reflect that round-tripping would then support less
severe restrictions on preserving comments when restructuring your data.

Answered By: Anthon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.