Preserving long single line string as is when round-triping in ruamel
Question:
In the below code, I’m trying to write a load and write a YAML string back to ensure that it retains the spacing as is.
import ruamel.yaml
yaml_str = """
long_single_line_text: Hello, there. This is nothing but a long single line text which is more that a 100 characters long
"""
yaml = ruamel.yaml.YAML() # defaults to round-trip
yaml.preserve_quotes = True
yaml.allow_duplicate_keys = True
yaml.explicit_start = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
And the result is
long_single_line_text: Hello, there. This is nothing but a long single line text which
is more that a 100 characters long
Here the line breaks at around character 87, I’m not sure if this is a setting that can be configured but having the long line as is would help me not have huge diffs when adding new keys.
If I set to a longer width via yaml.width
then the multi-line string become a long single string, so can’t do that.
Is there anyway I can keep the string as in for long single line scalars?
Answers:
There is only one parameter for wrapping .width
. It determines the wrapping point for scalars (quoted and unquoted) and for flow style collections. If your input was wrapped inconsistently then that is normalized, just like inconsistent indentation would be.
One of the reasons ruamel.yaml
was changed to use a new API is that before (as in PyYAML), introducing new parameters was difficult, since they needed to be passed from Loader
instances to all the other instances it invokes ( Parser, Constructor, Composer, Resolver, Serializer, etc.) during construction, resulting in code changes in many files.
The .width
parameter controls self.best_width
with the Emitter
. In the answer here the altnernative writer for double_width_scalars
uses that in the if statement:
if (
0 < end < len(text) - 1
and (ch == u' ' or start >= end)
and self.column + (end - start) > self.best_width
and split
):
If you replace self.best_width
in there by self.dumper.dqwidth
and replace yaml.width = 27
with yaml.dqwidth = 27
then only your double quoted scalars will be wrapped at 27.
In general, in the new API, the instances used during loading and dumping are non-fleeting (and therefore have an explicit initialisation before every use) and within such an instance you can access the YAML()
instance using self.loader
resp self.dumper
In the below code, I’m trying to write a load and write a YAML string back to ensure that it retains the spacing as is.
import ruamel.yaml
yaml_str = """
long_single_line_text: Hello, there. This is nothing but a long single line text which is more that a 100 characters long
"""
yaml = ruamel.yaml.YAML() # defaults to round-trip
yaml.preserve_quotes = True
yaml.allow_duplicate_keys = True
yaml.explicit_start = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
And the result is
long_single_line_text: Hello, there. This is nothing but a long single line text which
is more that a 100 characters long
Here the line breaks at around character 87, I’m not sure if this is a setting that can be configured but having the long line as is would help me not have huge diffs when adding new keys.
If I set to a longer width via yaml.width
then the multi-line string become a long single string, so can’t do that.
Is there anyway I can keep the string as in for long single line scalars?
There is only one parameter for wrapping .width
. It determines the wrapping point for scalars (quoted and unquoted) and for flow style collections. If your input was wrapped inconsistently then that is normalized, just like inconsistent indentation would be.
One of the reasons ruamel.yaml
was changed to use a new API is that before (as in PyYAML), introducing new parameters was difficult, since they needed to be passed from Loader
instances to all the other instances it invokes ( Parser, Constructor, Composer, Resolver, Serializer, etc.) during construction, resulting in code changes in many files.
The .width
parameter controls self.best_width
with the Emitter
. In the answer here the altnernative writer for double_width_scalars
uses that in the if statement:
if (
0 < end < len(text) - 1
and (ch == u' ' or start >= end)
and self.column + (end - start) > self.best_width
and split
):
If you replace self.best_width
in there by self.dumper.dqwidth
and replace yaml.width = 27
with yaml.dqwidth = 27
then only your double quoted scalars will be wrapped at 27.
In general, in the new API, the instances used during loading and dumping are non-fleeting (and therefore have an explicit initialisation before every use) and within such an instance you can access the YAML()
instance using self.loader
resp self.dumper