I have a Python test script that requires a configuration file. The configuration file is expected to be in JSON format.
But some of the users of my test script dislike the JSON format because it’s unreadable.
So I changed my test script so that it expects the configuration file in YAML format, then converts the YAML file to a JSON file.
I would prefer that the function that loads the configuration file to handle both JSON and YAML. Is there a method in either the yaml or json module that can give me a Boolean response if the configuration file is JSON or YAML?
My workaround right now is to use two try/except clauses:
import os import json import yaml # This is the configuration file - my script gets it from argparser but in # this example, let's just say it is some file that I don't know what the format # is config_file = "some_config_file" in_fh = open(config_file, "r") config_dict = dict() valid_json = True valid_yaml = True try: config_dict = json.load(in_fh) except: print "Error trying to load the config file in JSON format" valid_json = False try: config_dict = yaml.load(in_fh) except: print "Error trying to load the config file in YAML format" valid_yaml = False in_fh.close() if not valid_yaml and not valid_json: print "The config file is neither JSON or YAML" sys.exit(1)
Now, there is a Python module I found on the Internet called isityaml that can be used to test for YAML. But I’d prefer not to install another package because I have to install this on several test hosts.
Does the json and yaml module have a method that gives me back a Boolean that tests for their respective formats?
config_file = "sample_config_file" # I would like some method like this if json.is_json(in_fh): config_dict = json.load(in_fh)
From looking at the
yaml modules’ documentation, it looks like they don’t offer any appropriate modules. However, a common Python idiom is EAFP (“easier to ask forgiveness than permission”); in other words, go ahead and try to do the operation, and deal with exceptions if they arise.
def load_config(config_file): with open(config_file, "r") as in_fh: # Read the file into memory as a string so that we can try # parsing it twice without seeking back to the beginning and # re-reading. config = in_fh.read() config_dict = dict() valid_json = True valid_yaml = True try: config_dict = json.loads(config) except: print "Error trying to load the config file in JSON format" valid_json = False try: config_dict = yaml.safe_load(config) except: print "Error trying to load the config file in YAML format" valid_yaml = False
You could make your own
is_yaml function if you wanted. This would involve processing the configuration twice, but that may be okay for your purposes.
def try_as(loader, s, on_error): try: loader(s) return True except on_error: return False def is_json(s): return try_as(json.loads, s, ValueError) def is_yaml(s): return try_as(yaml.safe_load, s, yaml.scanner.ScannerError)
Finally, as @user2357112 alluded to, “every JSON file is also a valid YAML file” (as of YAML 1.2), so you should be able to unconditionally process everything as YAML (assuming you have a YAML 1.2-compatible parser; Python’s default
yaml module isn’t).
I conclude that you use the old PyYAML. That package only supports YAML 1.1 (from 2005) and the format specified there is not a full superset of JSON. With the YAML 1.2 (released 2009), the YAML format became a superset of JSON.
ruamel.yaml (disclaimer: I am the author of that package) supports YAML 1.2. You can install it in your python virtual enviroment with
pip install ruamel.yaml. And by replacing PyYAML by
ruamel.yaml (and not adding a package), you can just do:
import os from ruamel.yaml import YAML config_file = "some_config_file" yaml = YAML() with open(config_file, "r") as in_fh: config_dict = yaml.load(in_fh)
and load the file into
config_dict, not caring about whether the input is YAML or JSON and no need for having a test for either format.
After years I met the same trouble. I fully agree with EAFP, but still I’m trying find the best detection if the configuration file is in JSON format or YAML.
In code I have methods which inform user where he did issue in json-file and where in YAML. try/except did not handle this as I really want, and my eyes are bleeding when I see those nested blocks.
This is not perfect, still has minor issues, but for me, the basic concept fits my needs. I’d say "good enough".
My solution is: find all possible standalone commas in configuration file. If config file contains standalone commas (separators in json) we have json-file, if we do not find any commas, it’s yaml.
In my yaml-file I use commas only in comments (between " ") and in lists (between [ ]).
Maybe someone will find it usefull.
import re from pathlib import Path commas = re.compile(r',(?=(?!["]*[sw?."!-_]*,))(?=(?![^*]))') """ Find all commas which are standalone - not between quotes - comments, answers - not between brackets - lists """ file_path = Path("example_file.cfg") signs = commas.findall(file_path.open('r').read()) return "json" if len(signs) > 0 else "yaml"
I don’t know if this has been answered already, but here is a way to do it
def input_parameters(file): default_ext = '.json' #set a default extension file_ext = pathlib.Path(file).suffix with open(file, 'r') as f: if file_ext == default_ext: input_file = json.load(f) else: input_file = yaml.safe_load(f) return input_file