Can json.loads ignore trailing commas?

Question:

As mentioned in this StackOverflow question, you are not allowed to have any trailing commas in json. For example, this

{
    "key1": "value1",
    "key2": "value2"
}

is fine, but this

{
    "key1": "value1",
    "key2": "value2",
}

is invalid syntax.

For reasons mentioned in this other StackOverflow question, using a trailing comma is legal (and perhaps encouraged?) in Python code. I am working with both Python and JSON, so I would love to be able to be consistent across both types of files. Is there a way to have json.loads ignore trailing commas?

Asked By: Rob Watts

||

Answers:

Strip the commas before you pass the value in.

import re

def clean_json(string):
    string = re.sub(",[ trn]+}", "}", string)
    string = re.sub(",[ trn]+]", "]", string)

    return string
Answered By: andrewgrz

You can wrap python’s json parser with jsoncomment

JSON Comment allows to parse JSON files or strings with:

  • Single and Multi line comments
  • Multi line data strings
  • Trailing commas in objects and arrays, after the last item

Example usage:

import json
from jsoncomment import JsonComment

with open(filename) as data_file:    
    parser = JsonComment(json)
    data = parser.load(data_file)
Answered By: Steve Lorimer

In python you can have trailing commas inside of dictionaries and lists, so we should be able to take advantage of this using ast.literal_eval:

import ast, json

str = '{"key1": "value1", "key2": "value2",}'

python_obj = ast.literal_eval(str) 
# python_obj is {'key1': 'value1', 'key2': 'value2'}

json_str = json.dumps(python_obj)
# json_str is '{"key1": "value1", "key2": "value2"}'

However, JSON isn’t exactly python so there are a few edge cases to this. For example, values like null, true, false don’t exist in python. We can replace those with valid python equivalents before we run the eval:

import ast, json

def clean_json(str):
  str = str.replace('null', 'None').replace('true', 'True').replace('false', 'False')
  return json.dumps(ast.literal_eval(str))

This will unfortunately mangle any strings that have the words null, true, or false in them.

{"sentence": "show your true colors"} 

would become

{"sentence": "show your True colors"}
Answered By: Porkbutts

Cobbling together the knowledge from a few other answers, especially the idea of using literal_eval from @Porkbutts answer, I present a wildly-evil solution to this problem

def json_cleaner_loader(path):
    with open(path) as fh:
        exec("null=None;true=True;false=False;d={}".format(fh.read()))
    return locals()["d"]

This works by defining the missing constants to be their Pythonic values before evaluating the JSON struct as Python code. The structure can then be accessed from locals() (which is yet another dictionary).

This should work with both Python 2.7 and Python 3.x

BEWARE this will execute whatever is in the passed file, which may do anything the Python interpreter can, so it should only ever be used on inputs which are known to be safe (ie. don’t let web clients provide the content) and probably not in any production environment.
This probably also fails if it’s given a very large amount of content.


Late addendum: A side effect of this (awful) approach is that it supports Python comments within the JSON (JSON-like?) data, though it’s hard to compare that to even friendly non-standard behavior.

Answered By: ti7

Use rapidjson

rapidjson.load("file.json", parse_mode = rapidjson.PM_COMMENTS | rapidjson.PM_TRAILING_COMMAS)
Answered By: user404906

Fast forward to 2021, now we have https://pypi.org/project/json5/

A quote from the link:

A Python implementation of the JSON5 data format.

JSON5 extends the JSON data interchange format to make it slightly
more usable as a configuration language:

  • JavaScript-style comments (both single and multi-line) are legal.
  • Object keys may be unquoted if they are legal ECMAScript identifiers
  • Objects and arrays may end with trailing commas.
  • Strings can be single-quoted, and multi-line string literals are
    allowed.

Usage is consistent with python’s built in json module:

>>> import json5
>>> json5.loads('{"key1": "{my special value,}",}')
{u'key1': u'{my special value,}'}

It does come with a warning:

Known issues

  • Did I mention that it is SLOW?

It is fast enough for loading start up config etc.

Answered By: AnyDev

If I don’t have the option of using any external module, my typical approach is to first just sanitize the input (i.e. remove the trailing commas and comments) and then use the built-in JSON parser.

Here’s an example that uses three regular expressions to strip both single-line and multi-line comments and then trailing commas on the JSON input string then passes it to the built-in json.loads method.

#!/usr/bin/env python

import json, re, sys

unfiltered_json_string = '''
{
    "name": "Grayson",
    "age": 45,
    "car": "A3",
    "flag": false,
    "default": true,
    "entries": [ // "This is the beginning of the comment with some quotes" """""
        "red", // This is another comment. " "" """ """"
        null, /* This is a multi line comment //
"Here's a quote on another line."
*/
        false,
        true,
    ],
    "object": {
        "key3": null,
        "key2": "This is a string with some comment characters // /* */ // /////.",
        "key1": false,
    },
}
'''

RE_SINGLE_LINE_COMMENT = re.compile(r'("(?:(?=(\?))2.)*?")|(?:/{2,}.*)')
RE_MULTI_LINE_COMMENT = re.compile(r'("(?:(?=(\?))2.)*?")|(?:/*(?:(?!*/).)+*/)', flags=re.M|re.DOTALL)
RE_TRAILING_COMMA = re.compile(r',(?=s*?[}]])')

if sys.version_info < (3, 5):
    # For Python versions before 3.5, use the patched copy of re.sub.
    # Based on https://gist.github.com/gromgull/3922244
    def patched_re_sub(pattern, repl, string, count=0, flags=0):
        def _repl(m):
            class _match():
                def __init__(self, m):
                    self.m=m
                    self.string=m.string
                def group(self, n):
                    return m.group(n) or ''
            return re._expand(pattern, _match(m), repl)
        return re.sub(pattern, _repl, string, count=0, flags=0)
    filtered_json_string = patched_re_sub(RE_SINGLE_LINE_COMMENT, r'1', unfiltered_json_string)
    filtered_json_string = patched_re_sub(RE_MULTI_LINE_COMMENT, r'1', filtered_json_string)
else:
    filtered_json_string = RE_SINGLE_LINE_COMMENT.sub(r'1', unfiltered_json_string)
    filtered_json_string = RE_MULTI_LINE_COMMENT.sub(r'1', filtered_json_string)
filtered_json_string = RE_TRAILING_COMMA.sub('', filtered_json_string)

json_data = json.loads(filtered_json_string)
print(json.dumps(json_data, indent=4, sort_keys=True))
Answered By: Grayson Lang
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.