Pyyaml dump does not produce anchors for the same objects

Question:

I was experimenting a bit with PyYaml and I wanted to have a reference to a value appearing previously in the yaml. To give an example:

import yaml
a=25
dict_to_dump={'a':a,'b':a}
yaml.dump(dict_to_dump)

from what I understood from the specifications pyyaml should be adding an anchor to each object that has already been encountered. In my case, I would expect to have in the yaml file:

a:&id 25
b:*id

as the objects passed are exactly the same but instead, I find:

a:25
b:25

how can I obtain the desired behaviour?

Asked By: Bomps

||

Answers:

25 isn’t an object*. Try with a list or dictionary and it works like you’re expecting:

import yaml

a = 25
dict_to_dump = {'a':a,'b':a}
print(yaml.dump(dict_to_dump))
# a: 25
# b: 25

a = 'string'
dict_to_dump = {'a':a,'b':a}
print(yaml.dump(dict_to_dump))
# a: string
# b: string

a = [1, 2, 3]
dict_to_dump = {'a':a,'b':a}
print(yaml.dump(dict_to_dump))
# a: &id001
# - 1
# - 2
# - 3
# b: *id001

* Okay, everything is an object in Python. In this case "object" refers to a JSON/YAML object, so a list or dictionary.

Answered By: Woodford

First of all your expectation is incorrect. What you could expect is

a: &id 25
b: *id

with a space after the value indicator (:).

You also will need to do yaml.dump(dict_to_dump, sys.stdout) to get any output from your program, and what
you indicate is not what you get (it again is missing spaces after the value indicator).


You normally only get an alias if you have two objects a and b with the same value for id(a) and id(b).
Simple objects like integers and strings (that are reused from a pool) have the same id() even if assigned
in different places in the source. Variable structures like a dict or list, or instances of Python classes
do not usually have the same id().

PyYAML does know about this and handles some types of objects different even if the id() is the same.

import sys
import yaml
import datetime

a = 25
b = 25
c = 'some string specified twice in the source'
d = 'some string specified twice in the source'
e = datetime.date(2023, 1, 11)
f = datetime.date(2023, 1, 11)

print('a-b', id(a) == id(b))
print('c-d', id(c) == id(d))
print('e-f', id(e) == id(f))
print('=====')

dict_to_dump = dict(e=e, x=e, f=f)
yaml.dump(dict_to_dump, sys.stdout)

which gives:

a-b True
c-d True
e-f False
=====
e: &id001 2023-01-11
f: 2023-01-11
x: *id001

If you want to get the expected output, you have to make a Python class Int that behaves like an integer.
And then when you do a = Int(25) you will get your anchor and alias.

This is what my library ruamel.yaml does, when loading in the default round-trip mode, it also preserves the
actual anchor/alias used:

import sys
import ruamel.yaml

yaml_str = """
a: &my_special_id 25
b: *my_special_id
"""

yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
print(f'{data["a"] * 4  =}')
print(f'{data["b"] + 75 =}')
print('=====')
yaml.dump(data, sys.stdout)

which gives:

data["a"] * 4  =100
data["b"] + 75 =100
=====
a: &my_special_id 25
b: *my_special_id

To create data from scratch is also possible

import sys
import ruamel.yaml

Int = ruamel.yaml.scalarint.ScalarInt

a = Int(25, anchor='id')
data = dict(a=a, b=a)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

which gives what you expected in the first place:

a: &id 25
b: *id
Answered By: Anthon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.