pyyaml is producing undesired !!python/unicode output

Question:

I am using pyyaml to dump an object to a file. There are several unicode strings in the object. I’ve done this before, but now it’s producing output items like this:

'item': !!python/unicode "some string"

Instead of the desired:

'item': 'some string'

I’m intending to output as utf-8. The current command I use is:

yaml.dump(data,file(suite_out,'w'),encoding='utf-8',indent=4,allow_unicode=True)

In other locations I do the following and it works:

codecs.open(suite_out,"w","utf-8").write(
    yaml.dump(suite,indent=4,width=10000)
)

What am I doing wrong?

Python 2.7.3

Asked By: edA-qa mort-ora-y

||

Answers:

I tried many combinations and the only one I can find that consistently produces the correct YAML output is:

yaml.safe_dump(data, file(filename,'w'), encoding='utf-8', allow_unicode=True)
Answered By: edA-qa mort-ora-y

Inspired by the accepted answer, that safe_dump can produce the expected result, I checked the source of python2.7/site-packages/yaml/representer.py, and found that the Representer for dump and safe_dump are using different represent functions for unicode.

And the represent function can be overwritten with add_representer. So you can just get the represent function from the SafeRepresenter, and register it to be used in dump.

I have to do this as I have some custom types, so I cannot use safe_dump.

The code is as following:

def represent_unicode(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data)
yaml.add_representer(unicode, represent_unicode)

My command to produce the output:

yaml.dump(yml, encoding='utf-8', allow_unicode=True, default_flow_style=False, explicit_start=True)

python version is 2.7.5, PyYMAL is 3.10.

Answered By: fefe
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.