How to parse json file with c-style comments?
Question:
I have a json file, such as the following:
{
"author":"John",
"desc": "If it is important to decode all valid JSON correctly
and speed isn't as important, you can use the built-in json module,
orsimplejson. They are basically the same but sometimes simplej
further along than the version of it that is included with
distribution."
//"birthday": "nothing" //I comment this line
}
This file is auto created by another program. How do I parse it with Python?
Answers:
I have not personally used it, but the jsoncomment python package supports parsing a JSON file with comments.
You use it in place of the JSON parser as follows:
parser = JsonComment(json)
parsed_object = parser.loads(jsonString)
I can not imagine a json file “auto created by other program” would contain comments inside. Because json spec defines no comment at all, and that is by design, so no json library would output a json file with comment.
Those comments are usually added later, by a human. No exception in this case. The OP mentioned that in his post: //"birthday": "nothing" //I comment this line
.
So the real question should be, how do I properly comment some content in a json file, yet maintaining its compliance with spec and hence its compatibility with other json libraries?
And the answer is, rename your field to another name. Example:
{
"foo": "content for foo",
"bar": "content for bar"
}
can be changed into:
{
"foo": "content for foo",
"this_is_bar_but_been_commented_out": "content for bar"
}
This will work just fine most of the time because the consumer will very likely ignore unexpected fields (but not always, it depends on your json file consumer’s implementation. So YMMV.)
UPDATE: Apparently some reader was unhappy because this answer does not give the “solution” they expect. Well, in fact, I did give a working solution, by implicitly linking to the JSON designer’s quote:
Douglas Crockford Public Apr 30, 2012 Comments in JSON
I removed comments from JSON because I saw people were using them to
hold parsing directives, a practice which would have destroyed
interoperability. I know that the lack of comments makes some people
sad, but it shouldn’t.
Suppose you are using JSON to keep configuration files, which you
would like to annotate. Go ahead and insert all the comments you like.
Then pipe it through JSMin before handing it to your JSON parser.
So, yeah, go ahead to use JSMin. Just keep in mind that when you are heading towards “using comments in JSON”, that is a conceptually uncharted territory. There is no guarantee that whatever tools you choose would handle: inline [1,2,3,/* a comment */ 10]
, Python style [1, 2, 3] # a comment
(which is a comment in Python but not in Javascript), INI style [1, 2, 3] ; a comment
, …, you get the idea.
I would still suggest to NOT adding noncompliant comments in JSON in the first place.
jsoncomment is good, but inline comment is not supported.
Check out jstyleson, which support
- inline comment
- single-line comment
- multi-line comment
- trailing comma.
Comments are NOT preserved. jstyleson
first removes all comments and trailing commas, then uses the standard json module. It seems like function arguments are forwarded and work as expected. It also exposes dispose
to return the cleaned string contents without parsing.
Example
Install
pip install jstyleson
Usage
import jstyleson
result_dict = jstyleson.loads(invalid_json_str) # OK
jstyleson.dumps(result_dict)
How about commentjson?
http://commentjson.readthedocs.io/en/latest/
This can parse something like below.
{
"name": "Vaidik Kapoor", # Person's name
"location": "Delhi, India", // Person's location
# Section contains info about
// person's appearance
"appearance": {
"hair_color": "black",
"eyes_color": "black",
"height": "6"
}
}
Likely elasticsearch, some products’ REST API do not accept comment field. Therefore, I think comment inside json is necessary for a client in order to maintain such as a json template.
EDITED
jsmin seems to be more common.
If you are like me who prefers avoiding external libraries, this function I wrote will read json from a file and remove "//" and "/* */" type comments:
def GetJsonFromFile(filePath):
contents = ""
fh = open(filePath)
for line in fh:
cleanedLine = line.split("//", 1)[0]
if len(cleanedLine) > 0 and line.endswith("n") and "n" not in cleanedLine:
cleanedLine += "n"
contents += cleanedLine
fh.close
while "/*" in contents:
preComment, postComment = contents.split("/*", 1)
contents = preComment + postComment.split("*/", 1)[1]
return contents
Limitations: As David F. brought up in the comments, this will break beautifully (ie: horribly) with //
and /*
inside string literals. Would need to write some code around it if you want to support //
, /*
, */
within your json string contents.
You might look at Json5, if you’re not really caring about strict by-the-book JSON formatting and just want something that allows you to have comments in JSON. For example, this library will let you parse JSON5: https://pypi.org/project/json5/
in short: use jsmin
pip install jsmin
import json
from jsmin import jsmin
with open('parameters.jsonc') as js_file:
minified = jsmin(js_file.read())
parameters = json.loads(minified)
I recommend everyone switch to a JSON5 library instead. JSON5 is JSON with JavaScript features/support. It’s the most popular JSON language extension in the world. It has comments, support for trailing commas in objects/arrays, support for single-quoted keys/strings, support for unquoted object keys, etc. And there’s proper parser libraries with deep test suites and everything working perfectly.
There are two different, high-quality Python implementations:
-
https://github.com/dpranke/pyjson5 (written entirely in Python, it’s slow, has its own test suite, project started in 2015 and more "liked"). PyPi Page: https://pypi.org/project/json5/
-
Recommended: https://github.com/Kijewski/pyjson5 (uses compiled native code via Cython which is much faster, uses the official json5 js test suite instead of its own, project started in 2018). PyPi Page: https://pypi.org/project/pyjson5/
Here’s the JSON5 spec: https://json5.org/
Here’s a small standalone wrapper:
#!/usr/bin/env python3
import json
import re
def json_load_nocomments( filename_or_fp, comment = "//|#", **jsonloadskw ) -> "json dict":
""" load json, skipping comment lines starting // or #
or white space //, or white space #
"""
# filename_or_fp -- lines -- filter out comments -- bigstring -- json.loads
if hasattr( filename_or_fp, "readlines" ): # open() or file-like
lines = filename_or_fp.readlines()
else:
with open( filename_or_fp ) as fp:
lines = fp.readlines() # with n
iscomment = re.compile( r"s*(" + comment + ")" ).match
notcomment = lambda line: not iscomment( line ) # ifilterfalse
bigstring = "".join( filter( notcomment, lines ))
# json.load( fp ) does loads( fp.read() ), the whole file in memory
return json.loads( bigstring, **jsonloadskw )
if __name__ == "__main__": # sanity test
import sys
for jsonfile in sys.argv[1:] or ["test.json"]:
print( "n-- " + jsonfile )
jsondict = json_load_nocomments( jsonfile )
# first few keys, val type --
for key, val in list( jsondict.items() )[:10]:
n = (len(val) if isinstance( val, (dict, list, str) )
else "" )
print( "%-10s : %s %s" % (
key, type(val).__name__, n ))
For the [95% of] cases when you just need simple leading //
line comments with a simple way to handle them:
import json
class JSONWithCommentsDecoder(json.JSONDecoder):
def __init__(self, **kw):
super().__init__(**kw)
def decode(self, s: str) -> Any:
s = 'n'.join(l for l in s.split('n') if not l.lstrip(' ').startswith('//'))
return super().decode(s)
your_obj = json.load(f, cls=JSONWithCommentsDecoder)
Improving a previous answer to provide correct line number support:
class JSONWithCommentsDecoder(json.JSONDecoder):
def __init__(self, **kw):
super().__init__(**kw)
def decode(self, s: str) -> Any:
s = 'n'.join(l if not l.lstrip().startswith('//') else '' for l in s.split('n'))
return super().decode(s)
C-style comments are officially part of the JSON5 specification.
❗️Important: Before you go any further please note that JSON5 and JSON are two different formats although compatible.
From json5.org:
JSON5 is an extension to the popular JSON file format that aims to be easier to write and maintain by hand (e.g. for config files). It is not intended to be used for machine-to-machine communication. (Keep using JSON or other file formats for that. )
- Install
json5
with:
pip3 install json5
- Use
json5
instead of json
:
import json5
print(json5.loads("""{
"author": "John",
"desc": "If it is import..",
// "birthday": "nothing"
}"""))
### OUTPUT: {'author': 'John', 'desc': 'If it is import..'}
I have a json file, such as the following:
{
"author":"John",
"desc": "If it is important to decode all valid JSON correctly
and speed isn't as important, you can use the built-in json module,
orsimplejson. They are basically the same but sometimes simplej
further along than the version of it that is included with
distribution."
//"birthday": "nothing" //I comment this line
}
This file is auto created by another program. How do I parse it with Python?
I have not personally used it, but the jsoncomment python package supports parsing a JSON file with comments.
You use it in place of the JSON parser as follows:
parser = JsonComment(json)
parsed_object = parser.loads(jsonString)
I can not imagine a json file “auto created by other program” would contain comments inside. Because json spec defines no comment at all, and that is by design, so no json library would output a json file with comment.
Those comments are usually added later, by a human. No exception in this case. The OP mentioned that in his post: //"birthday": "nothing" //I comment this line
.
So the real question should be, how do I properly comment some content in a json file, yet maintaining its compliance with spec and hence its compatibility with other json libraries?
And the answer is, rename your field to another name. Example:
{
"foo": "content for foo",
"bar": "content for bar"
}
can be changed into:
{
"foo": "content for foo",
"this_is_bar_but_been_commented_out": "content for bar"
}
This will work just fine most of the time because the consumer will very likely ignore unexpected fields (but not always, it depends on your json file consumer’s implementation. So YMMV.)
UPDATE: Apparently some reader was unhappy because this answer does not give the “solution” they expect. Well, in fact, I did give a working solution, by implicitly linking to the JSON designer’s quote:
Douglas Crockford Public Apr 30, 2012 Comments in JSON
I removed comments from JSON because I saw people were using them to
hold parsing directives, a practice which would have destroyed
interoperability. I know that the lack of comments makes some people
sad, but it shouldn’t.Suppose you are using JSON to keep configuration files, which you
would like to annotate. Go ahead and insert all the comments you like.
Then pipe it through JSMin before handing it to your JSON parser.
So, yeah, go ahead to use JSMin. Just keep in mind that when you are heading towards “using comments in JSON”, that is a conceptually uncharted territory. There is no guarantee that whatever tools you choose would handle: inline [1,2,3,/* a comment */ 10]
, Python style [1, 2, 3] # a comment
(which is a comment in Python but not in Javascript), INI style [1, 2, 3] ; a comment
, …, you get the idea.
I would still suggest to NOT adding noncompliant comments in JSON in the first place.
jsoncomment is good, but inline comment is not supported.
Check out jstyleson, which support
- inline comment
- single-line comment
- multi-line comment
- trailing comma.
Comments are NOT preserved. jstyleson
first removes all comments and trailing commas, then uses the standard json module. It seems like function arguments are forwarded and work as expected. It also exposes dispose
to return the cleaned string contents without parsing.
Example
Install
pip install jstyleson
Usage
import jstyleson
result_dict = jstyleson.loads(invalid_json_str) # OK
jstyleson.dumps(result_dict)
How about commentjson?
http://commentjson.readthedocs.io/en/latest/
This can parse something like below.
{
"name": "Vaidik Kapoor", # Person's name
"location": "Delhi, India", // Person's location
# Section contains info about
// person's appearance
"appearance": {
"hair_color": "black",
"eyes_color": "black",
"height": "6"
}
}
Likely elasticsearch, some products’ REST API do not accept comment field. Therefore, I think comment inside json is necessary for a client in order to maintain such as a json template.
EDITED
jsmin seems to be more common.
If you are like me who prefers avoiding external libraries, this function I wrote will read json from a file and remove "//" and "/* */" type comments:
def GetJsonFromFile(filePath):
contents = ""
fh = open(filePath)
for line in fh:
cleanedLine = line.split("//", 1)[0]
if len(cleanedLine) > 0 and line.endswith("n") and "n" not in cleanedLine:
cleanedLine += "n"
contents += cleanedLine
fh.close
while "/*" in contents:
preComment, postComment = contents.split("/*", 1)
contents = preComment + postComment.split("*/", 1)[1]
return contents
Limitations: As David F. brought up in the comments, this will break beautifully (ie: horribly) with //
and /*
inside string literals. Would need to write some code around it if you want to support //
, /*
, */
within your json string contents.
You might look at Json5, if you’re not really caring about strict by-the-book JSON formatting and just want something that allows you to have comments in JSON. For example, this library will let you parse JSON5: https://pypi.org/project/json5/
in short: use jsmin
pip install jsmin
import json
from jsmin import jsmin
with open('parameters.jsonc') as js_file:
minified = jsmin(js_file.read())
parameters = json.loads(minified)
I recommend everyone switch to a JSON5 library instead. JSON5 is JSON with JavaScript features/support. It’s the most popular JSON language extension in the world. It has comments, support for trailing commas in objects/arrays, support for single-quoted keys/strings, support for unquoted object keys, etc. And there’s proper parser libraries with deep test suites and everything working perfectly.
There are two different, high-quality Python implementations:
-
https://github.com/dpranke/pyjson5 (written entirely in Python, it’s slow, has its own test suite, project started in 2015 and more "liked"). PyPi Page: https://pypi.org/project/json5/
-
Recommended: https://github.com/Kijewski/pyjson5 (uses compiled native code via Cython which is much faster, uses the official json5 js test suite instead of its own, project started in 2018). PyPi Page: https://pypi.org/project/pyjson5/
Here’s the JSON5 spec: https://json5.org/
Here’s a small standalone wrapper:
#!/usr/bin/env python3
import json
import re
def json_load_nocomments( filename_or_fp, comment = "//|#", **jsonloadskw ) -> "json dict":
""" load json, skipping comment lines starting // or #
or white space //, or white space #
"""
# filename_or_fp -- lines -- filter out comments -- bigstring -- json.loads
if hasattr( filename_or_fp, "readlines" ): # open() or file-like
lines = filename_or_fp.readlines()
else:
with open( filename_or_fp ) as fp:
lines = fp.readlines() # with n
iscomment = re.compile( r"s*(" + comment + ")" ).match
notcomment = lambda line: not iscomment( line ) # ifilterfalse
bigstring = "".join( filter( notcomment, lines ))
# json.load( fp ) does loads( fp.read() ), the whole file in memory
return json.loads( bigstring, **jsonloadskw )
if __name__ == "__main__": # sanity test
import sys
for jsonfile in sys.argv[1:] or ["test.json"]:
print( "n-- " + jsonfile )
jsondict = json_load_nocomments( jsonfile )
# first few keys, val type --
for key, val in list( jsondict.items() )[:10]:
n = (len(val) if isinstance( val, (dict, list, str) )
else "" )
print( "%-10s : %s %s" % (
key, type(val).__name__, n ))
For the [95% of] cases when you just need simple leading //
line comments with a simple way to handle them:
import json
class JSONWithCommentsDecoder(json.JSONDecoder):
def __init__(self, **kw):
super().__init__(**kw)
def decode(self, s: str) -> Any:
s = 'n'.join(l for l in s.split('n') if not l.lstrip(' ').startswith('//'))
return super().decode(s)
your_obj = json.load(f, cls=JSONWithCommentsDecoder)
Improving a previous answer to provide correct line number support:
class JSONWithCommentsDecoder(json.JSONDecoder):
def __init__(self, **kw):
super().__init__(**kw)
def decode(self, s: str) -> Any:
s = 'n'.join(l if not l.lstrip().startswith('//') else '' for l in s.split('n'))
return super().decode(s)
C-style comments are officially part of the JSON5 specification.
❗️Important: Before you go any further please note that JSON5 and JSON are two different formats although compatible.
From json5.org:
JSON5 is an extension to the popular JSON file format that aims to be easier to write and maintain by hand (e.g. for config files). It is not intended to be used for machine-to-machine communication. (Keep using JSON or other file formats for that. )
- Install
json5
with:
pip3 install json5
- Use
json5
instead ofjson
:
import json5
print(json5.loads("""{
"author": "John",
"desc": "If it is import..",
// "birthday": "nothing"
}"""))
### OUTPUT: {'author': 'John', 'desc': 'If it is import..'}