Dump to JSON adds additional double quotes and escaping of quotes
Question:
I am retrieving Twitter data with a Python tool and dump these in JSON format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual JSON formatting are escaped with a backslash.
They look like this:
“{“created_at”:”Fri Aug 08 11:04:40 +0000
2014″,”id”:497699913925292032,
How do I avoid that? It should be:
{“created_at”:”Fri Aug 08 11:04:40 +0000 2014″ …..
My file-out code looks like this:
with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f:
f.write(unicode(json.dumps(data, ensure_ascii=False)))
f.write(unicode('n'))
The unintended escaping causes problems when reading in the JSON file in a later processing step.
Answers:
You are double encoding your JSON strings. data
is already a JSON string, and doesn’t need to be encoded again:
>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}"
Just write these directly to your file:
with open('data{}.txt'.format(self.timestamp), 'a') as f:
f.write(data + 'n')
Another situation where this unwanted escaping can happen is if you try to use json.dump() on the pre-processed output of json.dumps(). For example
import json, sys
json.dump({"foo": json.dumps([{"bar": 1}, {"baz": 2}])},sys.stdout)
will result in
{"foo": "[{"bar": 1}, {"baz": 2}]"}
To avoid this, you need to pass dictionaries rather than the output of json.dumps(), e.g.
json.dump({"foo": [{"bar": 1}, {"baz": 2}]},sys.stdout)
which outputs the desired
{"foo": [{"bar": 1}, {"baz": 2}]}
(Why would you pre-process the inner list with json.dumps(), you ask? Well, I had another function that was creating that inner list out of other stuff, and I thought it would make sense to return a json object from that function… Wrong.)
Extending for others having similar issue, I used this to dump the JSON formatted data to file where the data came from an API call. Just an indicative example below, update as per your requirement
import json
# below is an example, this came for me from an API call
json_string = '{"address":{"city":"NY", "country":"USA"}}'
# dump the JSON data into file ( dont use json.dump as explained in other answers )
with open('direct_json.json','w') as direct_json:
direct_json.write(json_string)
direct_json.write("n")
# load as dict
json_dict = json.loads(json_string)
# pretty print
print(json.dumps(json_dict, indent = 1))
# write pretty JSON to file
with open('formatted.json','w') as formatted_file:
json.dump(json_dict, formatted_file, indent=4)
Simple way to get around that, which worked for me is to use the json loads function before dumping, like the following :
import json
data = json.loads('{"foo": json.dumps([{"bar": 1}, {"baz": 2}])}')
with open('output.json','w') as f:
json.dump(data,f,indent=4)
Set escape_forward_slashes=False to prevent escaping / characters
Solved:
ujson.dumps({"a":"aa//a/dfdf"}, escape_forward_slashes=False )
'{"a":"aa//a/dfdf"}'
Default:
ujson.dumps({"a":"aa//a/dfdf"}, escape_forward_slashes=True )
'{"a":"aa\/\/a\/dfdf"}'
I am retrieving Twitter data with a Python tool and dump these in JSON format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual JSON formatting are escaped with a backslash.
They look like this:
“{“created_at”:”Fri Aug 08 11:04:40 +0000
2014″,”id”:497699913925292032,
How do I avoid that? It should be:
{“created_at”:”Fri Aug 08 11:04:40 +0000 2014″ …..
My file-out code looks like this:
with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f:
f.write(unicode(json.dumps(data, ensure_ascii=False)))
f.write(unicode('n'))
The unintended escaping causes problems when reading in the JSON file in a later processing step.
You are double encoding your JSON strings. data
is already a JSON string, and doesn’t need to be encoded again:
>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}"
Just write these directly to your file:
with open('data{}.txt'.format(self.timestamp), 'a') as f:
f.write(data + 'n')
Another situation where this unwanted escaping can happen is if you try to use json.dump() on the pre-processed output of json.dumps(). For example
import json, sys
json.dump({"foo": json.dumps([{"bar": 1}, {"baz": 2}])},sys.stdout)
will result in
{"foo": "[{"bar": 1}, {"baz": 2}]"}
To avoid this, you need to pass dictionaries rather than the output of json.dumps(), e.g.
json.dump({"foo": [{"bar": 1}, {"baz": 2}]},sys.stdout)
which outputs the desired
{"foo": [{"bar": 1}, {"baz": 2}]}
(Why would you pre-process the inner list with json.dumps(), you ask? Well, I had another function that was creating that inner list out of other stuff, and I thought it would make sense to return a json object from that function… Wrong.)
Extending for others having similar issue, I used this to dump the JSON formatted data to file where the data came from an API call. Just an indicative example below, update as per your requirement
import json
# below is an example, this came for me from an API call
json_string = '{"address":{"city":"NY", "country":"USA"}}'
# dump the JSON data into file ( dont use json.dump as explained in other answers )
with open('direct_json.json','w') as direct_json:
direct_json.write(json_string)
direct_json.write("n")
# load as dict
json_dict = json.loads(json_string)
# pretty print
print(json.dumps(json_dict, indent = 1))
# write pretty JSON to file
with open('formatted.json','w') as formatted_file:
json.dump(json_dict, formatted_file, indent=4)
Simple way to get around that, which worked for me is to use the json loads function before dumping, like the following :
import json
data = json.loads('{"foo": json.dumps([{"bar": 1}, {"baz": 2}])}')
with open('output.json','w') as f:
json.dump(data,f,indent=4)
Set escape_forward_slashes=False to prevent escaping / characters
Solved:
ujson.dumps({"a":"aa//a/dfdf"}, escape_forward_slashes=False )
'{"a":"aa//a/dfdf"}'
Default:
ujson.dumps({"a":"aa//a/dfdf"}, escape_forward_slashes=True )
'{"a":"aa\/\/a\/dfdf"}'