YAML loads 5e-6 as string and not a number
Question:
When I load a number with e form a JSON dump with YAML, the number is loaded as a string and not a float.
I think this simple example can explain my problem.
import json
import yaml
In [1]: import json
In [2]: import yaml
In [3]: All = {'one':1,'low':0.000001}
In [4]: jAll = json.dumps(All)
In [5]: yAll = yaml.safe_load(jAll)
In [6]: yAll
Out[6]: {'low': '1e-06', 'one': 1}
YAML loads 1e-06 as a string and not as a number? How can I fix it?
Answers:
The problem lies in the fact that the YAML Resolver is set up to match floats as follows:
Resolver.add_implicit_resolver(
u'tag:yaml.org,2002:float',
re.compile(u'''^(?:[-+]?(?:[0-9][0-9_]*)\.[0-9_]*(?:[eE][-+][0-9]+)?
|\.[0-9_]+(?:[eE][-+][0-9]+)?
|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\.[0-9_]*
|[-+]?\.(?:inf|Inf|INF)
|\.(?:nan|NaN|NAN))$''', re.X),
list(u'-+0123456789.'))
whereas the YAML spec specifies the regex for scientific notation as:
-? [1-9] ( . [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?
the latter makes the dot optional, which it isn’t in the above re.compile()
pattern in the implicit resolver.
The matching of floats can be fixed so it will accept floating point values with an e
/E
but without decimal dot and with exponents without sign (i.e. +
implied):
import yaml
import json
import re
All = {'one':1,'low':0.000001}
jAll = json.dumps(All)
loader = yaml.SafeLoader
loader.add_implicit_resolver(
u'tag:yaml.org,2002:float',
re.compile(u'''^(?:
[-+]?(?:[0-9][0-9_]*)\.[0-9_]*(?:[eE][-+]?[0-9]+)?
|[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
|\.[0-9_]+(?:[eE][-+][0-9]+)?
|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\.[0-9_]*
|[-+]?\.(?:inf|Inf|INF)
|\.(?:nan|NaN|NAN))$''', re.X),
list(u'-+0123456789.'))
data = yaml.load(jAll, Loader=loader)
print 'data', data
results in:
data {'low': 1e-06, 'one': 1}
There is discrepancy between what JSON allows in numbers and the regex in the YAML 1.2 spec (concerning the required dot in the number and e
being lower case).
The JSON specification is IMO very clear in that it doesn’t require the dot before ‘e/E’ nor that is requires a sign after the ‘e/E’:
The PyYAML implementation does match floats partially according to the JSON spec and partially against the regex and fails on numbers that should be valid.
ruamel.yaml (which is my enhanced version of PyYAML), has these updated pattern and works correctly:
import ruamel.yaml
import json
All = {'one':1,'low':0.000001}
jAll = json.dumps(All)
data = ruamel.yaml.load(jAll)
print 'data', data
with output:
data {'low': 1e-06, 'one': 1}
ruamel.yaml also accepts the number ‘1.0e6’, which PyYAML also sees as a string.
I am new to using YAML so no idea on what is best, but writing either
1.0e-1
or
1.0E-1
in my YAML file has worked out-of-the-box. That is, have a decimal with the coefficient (without the decimal, I also got strings).
I think that
1.0e-1
or
1.0E-1
have solve my problem. And my code to read the yaml file is like this
import yaml
def read_config(path: str):
"""read yaml file"""
with open(path, 'r') as f:
data = yaml.safe_load(f)
return data
When I load a number with e form a JSON dump with YAML, the number is loaded as a string and not a float.
I think this simple example can explain my problem.
import json
import yaml
In [1]: import json
In [2]: import yaml
In [3]: All = {'one':1,'low':0.000001}
In [4]: jAll = json.dumps(All)
In [5]: yAll = yaml.safe_load(jAll)
In [6]: yAll
Out[6]: {'low': '1e-06', 'one': 1}
YAML loads 1e-06 as a string and not as a number? How can I fix it?
The problem lies in the fact that the YAML Resolver is set up to match floats as follows:
Resolver.add_implicit_resolver(
u'tag:yaml.org,2002:float',
re.compile(u'''^(?:[-+]?(?:[0-9][0-9_]*)\.[0-9_]*(?:[eE][-+][0-9]+)?
|\.[0-9_]+(?:[eE][-+][0-9]+)?
|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\.[0-9_]*
|[-+]?\.(?:inf|Inf|INF)
|\.(?:nan|NaN|NAN))$''', re.X),
list(u'-+0123456789.'))
whereas the YAML spec specifies the regex for scientific notation as:
-? [1-9] ( . [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?
the latter makes the dot optional, which it isn’t in the above re.compile()
pattern in the implicit resolver.
The matching of floats can be fixed so it will accept floating point values with an e
/E
but without decimal dot and with exponents without sign (i.e. +
implied):
import yaml
import json
import re
All = {'one':1,'low':0.000001}
jAll = json.dumps(All)
loader = yaml.SafeLoader
loader.add_implicit_resolver(
u'tag:yaml.org,2002:float',
re.compile(u'''^(?:
[-+]?(?:[0-9][0-9_]*)\.[0-9_]*(?:[eE][-+]?[0-9]+)?
|[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
|\.[0-9_]+(?:[eE][-+][0-9]+)?
|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\.[0-9_]*
|[-+]?\.(?:inf|Inf|INF)
|\.(?:nan|NaN|NAN))$''', re.X),
list(u'-+0123456789.'))
data = yaml.load(jAll, Loader=loader)
print 'data', data
results in:
data {'low': 1e-06, 'one': 1}
There is discrepancy between what JSON allows in numbers and the regex in the YAML 1.2 spec (concerning the required dot in the number and e
being lower case).
The JSON specification is IMO very clear in that it doesn’t require the dot before ‘e/E’ nor that is requires a sign after the ‘e/E’:
The PyYAML implementation does match floats partially according to the JSON spec and partially against the regex and fails on numbers that should be valid.
ruamel.yaml (which is my enhanced version of PyYAML), has these updated pattern and works correctly:
import ruamel.yaml
import json
All = {'one':1,'low':0.000001}
jAll = json.dumps(All)
data = ruamel.yaml.load(jAll)
print 'data', data
with output:
data {'low': 1e-06, 'one': 1}
ruamel.yaml also accepts the number ‘1.0e6’, which PyYAML also sees as a string.
I am new to using YAML so no idea on what is best, but writing either
1.0e-1
or
1.0E-1
in my YAML file has worked out-of-the-box. That is, have a decimal with the coefficient (without the decimal, I also got strings).
I think that
1.0e-1
or
1.0E-1
have solve my problem. And my code to read the yaml file is like this
import yaml
def read_config(path: str):
"""read yaml file"""
with open(path, 'r') as f:
data = yaml.safe_load(f)
return data