Is there a way to determine whether a file is in YAML or JSON format?

Question:

I have a Python test script that requires a configuration file. The configuration file is expected to be in JSON format.

But some of the users of my test script dislike the JSON format because it’s unreadable.

So I changed my test script so that it expects the configuration file in YAML format, then converts the YAML file to a JSON file.

I would prefer that the function that loads the configuration file to handle both JSON and YAML. Is there a method in either the yaml or json module that can give me a Boolean response if the configuration file is JSON or YAML?

My workaround right now is to use two try/except clauses:

import os
import json
import yaml

# This is the configuration file - my script gets it from argparser but in
# this example, let's just say it is some file that I don't know what the format
# is
config_file = "some_config_file"

in_fh = open(config_file, "r")

config_dict = dict()
valid_json = True
valid_yaml = True

try:
    config_dict = json.load(in_fh)
except:
    print "Error trying to load the config file in JSON format"
    valid_json = False

try:
    config_dict = yaml.load(in_fh)
except:
    print "Error trying to load the config file in YAML format"
    valid_yaml = False

in_fh.close()

if not valid_yaml and not valid_json:
    print "The config file is neither JSON or YAML"
    sys.exit(1)

Now, there is a Python module I found on the Internet called isityaml that can be used to test for YAML. But I’d prefer not to install another package because I have to install this on several test hosts.

Does the json and yaml module have a method that gives me back a Boolean that tests for their respective formats?

config_file = "sample_config_file"

# I would like some method like this
if json.is_json(in_fh):
    config_dict = json.load(in_fh)
Asked By: SQA777

||

Answers:

From looking at the json and yaml modules’ documentation, it looks like they don’t offer any appropriate modules. However, a common Python idiom is EAFP (“easier to ask forgiveness than permission”); in other words, go ahead and try to do the operation, and deal with exceptions if they arise.

def load_config(config_file):
    with open(config_file, "r") as in_fh:
        # Read the file into memory as a string so that we can try
        # parsing it twice without seeking back to the beginning and
        # re-reading.
        config = in_fh.read()

    config_dict = dict()
    valid_json = True
    valid_yaml = True

    try:
        config_dict = json.loads(config)
    except:
        print "Error trying to load the config file in JSON format"
        valid_json = False

    try:
        config_dict = yaml.safe_load(config)
    except:
        print "Error trying to load the config file in YAML format"
        valid_yaml = False

You could make your own is_json or is_yaml function if you wanted. This would involve processing the configuration twice, but that may be okay for your purposes.

def try_as(loader, s, on_error):
    try:
        loader(s)
        return True
    except on_error:
        return False

def is_json(s):
    return try_as(json.loads, s, ValueError)

def is_yaml(s):
    return try_as(yaml.safe_load, s, yaml.scanner.ScannerError)

Finally, as @user2357112 alluded to, “every JSON file is also a valid YAML file” (as of YAML 1.2), so you should be able to unconditionally process everything as YAML (assuming you have a YAML 1.2-compatible parser; Python’s default yaml module isn’t).

Answered By: Josh Kelley

From your

import yaml

I conclude that you use the old PyYAML. That package only supports YAML 1.1 (from 2005) and the format specified there is not a full superset of JSON. With the YAML 1.2 (released 2009), the YAML format became a superset of JSON.

The package ruamel.yaml (disclaimer: I am the author of that package) supports YAML 1.2. You can install it in your python virtual enviroment with pip install ruamel.yaml. And by replacing PyYAML by ruamel.yaml (and not adding a package), you can just do:

import os
from ruamel.yaml import YAML

config_file = "some_config_file"

yaml = YAML()
with open(config_file, "r") as in_fh:
    config_dict = yaml.load(in_fh)

and load the file into config_dict, not caring about whether the input is YAML or JSON and no need for having a test for either format.

Answered By: Anthon

After years I met the same trouble. I fully agree with EAFP, but still I’m trying find the best detection if the configuration file is in JSON format or YAML.
In code I have methods which inform user where he did issue in json-file and where in YAML. try/except did not handle this as I really want, and my eyes are bleeding when I see those nested blocks.

This is not perfect, still has minor issues, but for me, the basic concept fits my needs. I’d say "good enough".

My solution is: find all possible standalone commas in configuration file. If config file contains standalone commas (separators in json) we have json-file, if we do not find any commas, it’s yaml.
In my yaml-file I use commas only in comments (between " ") and in lists (between [ ]).
Maybe someone will find it usefull.

import re
from pathlib import Path

commas = re.compile(r',(?=(?!["]*[sw?."!-_]*,))(?=(?![^[]*]))')
"""
Find all commas which are standalone 
 - not between quotes - comments, answers
 - not between brackets - lists
"""
file_path = Path("example_file.cfg")
signs = commas.findall(file_path.open('r').read())

return "json" if len(signs) > 0 else "yaml"
Answered By: Marek Gancarz

I don’t know if this has been answered already, but here is a way to do it

def input_parameters(file):
default_ext = '.json' #set a default extension
file_ext = pathlib.Path(file).suffix
with open(file, 'r') as f:
    if file_ext == default_ext:
        input_file = json.load(f)
    else:
        input_file = yaml.safe_load(f)
return input_file
Answered By: Sheikhsspear
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.