Unable to parse TAB in JSON files
Question:
I am running into a parsing problem when loading JSON files that seem to have the TAB character in them.
When I go to http://jsonlint.com/, and I enter the part with the TAB character:
{
"My_String": "Foo bar. Bar foo."
}
The validator complains with:
Parse error on line 2:
{ "My_String": "Foo bar. Bar foo."
------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['
This is literally a copy/paste of the offending JSON text.
I have tried loading this file with json
and simplejson
without success. How can I load this properly? Should I just pre-process the file and replace TAB by t
or by a space? Or is there anything that I am missing here?
Update:
Here is also a problematic example in simplejson
:
foo = '{"My_string": "Foo bar.t Bar foo."}'
simplejson.loads(foo)
JSONDecodeError: Invalid control character 't' at: line 1 column 24 (char 23)
Answers:
Tabs are legal as delimiting whitespace outside of values, but not within strings. To get a tab inside a JSON string you need to use the sequence t
instead.
But beware multiple levels of interpretation. This Python string from your update:
foo = '{"My_string": "Foo bar.t Bar foo."}'
is not valid JSON, because the Python interpreter turns that t
sequence into an actual tab character before the JSON processor ever sees it.
You can tell Python to put a literal t
in the string instead of a tab character by doubling the backslash:
foo = '{"My_string": "Foo bar.\t Bar foo."}'
Or you can use the "raw" string syntax, which doesn’t interpret any special backslash sequences:
foo = r'{"My_string": "Foo bar.t Bar foo."}'
Either way, the JSON processor will see a string containing a backslash followed by a ‘t’, rather than a string containing a tab.
You can include tabs within values (instead of as whitespace) in JSON files by escaping them. Here’s a working example with the json
module in Python2.7:
>>> import json
>>> obj = json.loads('{"MY_STRING": "Foo\tBar"}')
>>> obj['MY_STRING']
u'FootBar'
>>> print obj['MY_STRING']
Foo Bar
While not escaping the 't'
causes an error:
>>> json.loads('{"MY_STRING": "FootBar"}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 19 (char 18)
From JSON standard:
Insignificant whitespace is allowed before or after any token. The
whitespace characters are: character tabulation (U+0009), line feed
(U+000A), carriage return (U+000D), and space (U+0020). Whitespace is
not allowed within any token, except that space is allowed in
strings.
It means that a literal tab character is not allowed inside a JSON string. You need to escape it as t
(in a .json-file):
{"My_string": "Foo bar.t Bar foo."}
In addition if json text is provided inside a Python string literal then you need double escape the tab:
foo = '{"My_string": "Foo bar.\t Bar foo."}' # in a Python source
Or use a Python raw string literal:
foo = r'{"My_string": "Foo bar.t Bar foo."}' # in a Python source
Just to share my experience:
I am using snakemake and a config file written in Json. There are tabs in the json file for indentation. TAB are legal for this purpose. But I am getting error message: snakemake.exceptions.WorkflowError: Config file is not valid JSON or YAML. I believe this is a bug of snakemake; but I could be wrong. Please comment. After replacing all TABs with spaces the error message is gone.
In node-red flow i facing same type of problem:
flow.set("delimiter",'"t"');
error:
{ "status": "ERROR", "result": "Cannot parse config: String: 1: in value for key 'delimiter': JSON does not allow unescaped tab in quoted strings, use a backslash escape" }
solution:
i added in just \t
in the code.
flow.set("delimiter",'"\t"');
I am running into a parsing problem when loading JSON files that seem to have the TAB character in them.
When I go to http://jsonlint.com/, and I enter the part with the TAB character:
{
"My_String": "Foo bar. Bar foo."
}
The validator complains with:
Parse error on line 2:
{ "My_String": "Foo bar. Bar foo."
------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['
This is literally a copy/paste of the offending JSON text.
I have tried loading this file with json
and simplejson
without success. How can I load this properly? Should I just pre-process the file and replace TAB by t
or by a space? Or is there anything that I am missing here?
Update:
Here is also a problematic example in simplejson
:
foo = '{"My_string": "Foo bar.t Bar foo."}'
simplejson.loads(foo)
JSONDecodeError: Invalid control character 't' at: line 1 column 24 (char 23)
Tabs are legal as delimiting whitespace outside of values, but not within strings. To get a tab inside a JSON string you need to use the sequence t
instead.
But beware multiple levels of interpretation. This Python string from your update:
foo = '{"My_string": "Foo bar.t Bar foo."}'
is not valid JSON, because the Python interpreter turns that t
sequence into an actual tab character before the JSON processor ever sees it.
You can tell Python to put a literal t
in the string instead of a tab character by doubling the backslash:
foo = '{"My_string": "Foo bar.\t Bar foo."}'
Or you can use the "raw" string syntax, which doesn’t interpret any special backslash sequences:
foo = r'{"My_string": "Foo bar.t Bar foo."}'
Either way, the JSON processor will see a string containing a backslash followed by a ‘t’, rather than a string containing a tab.
You can include tabs within values (instead of as whitespace) in JSON files by escaping them. Here’s a working example with the json
module in Python2.7:
>>> import json
>>> obj = json.loads('{"MY_STRING": "Foo\tBar"}')
>>> obj['MY_STRING']
u'FootBar'
>>> print obj['MY_STRING']
Foo Bar
While not escaping the 't'
causes an error:
>>> json.loads('{"MY_STRING": "FootBar"}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 19 (char 18)
From JSON standard:
Insignificant whitespace is allowed before or after any token. The
whitespace characters are: character tabulation (U+0009), line feed
(U+000A), carriage return (U+000D), and space (U+0020). Whitespace is
not allowed within any token, except that space is allowed in
strings.
It means that a literal tab character is not allowed inside a JSON string. You need to escape it as t
(in a .json-file):
{"My_string": "Foo bar.t Bar foo."}
In addition if json text is provided inside a Python string literal then you need double escape the tab:
foo = '{"My_string": "Foo bar.\t Bar foo."}' # in a Python source
Or use a Python raw string literal:
foo = r'{"My_string": "Foo bar.t Bar foo."}' # in a Python source
Just to share my experience:
I am using snakemake and a config file written in Json. There are tabs in the json file for indentation. TAB are legal for this purpose. But I am getting error message: snakemake.exceptions.WorkflowError: Config file is not valid JSON or YAML. I believe this is a bug of snakemake; but I could be wrong. Please comment. After replacing all TABs with spaces the error message is gone.
In node-red flow i facing same type of problem:
flow.set("delimiter",'"t"');
error:
{ "status": "ERROR", "result": "Cannot parse config: String: 1: in value for key 'delimiter': JSON does not allow unescaped tab in quoted strings, use a backslash escape" }
solution:
i added in just \t
in the code.
flow.set("delimiter",'"\t"');