Python 2.6 JSON decoding performance
Question:
I’m using the json
module in Python 2.6 to load and decode JSON files. However I’m currently getting slower than expected performance. I’m using a test case which is 6MB in size and json.loads()
is taking 20 seconds.
I thought the json
module had some native code to speed up the decoding?
How do I check if this is being used?
As a comparison, I downloaded and installed the python-cjson
module, and cjson.decode()
is taking 1 second for the same test case.
I’d rather use the JSON module provided with Python 2.6 so that users of my code aren’t required to install additional modules.
(I’m developing on Mac OS X, but I getting a similar result on Windows XP.)
Answers:
Looking in my installation of Python 2.6.1 on windows, the json
package loads the _json
module, which is built into the runtime. C
source for the json speedups
module is here.
>>> import _json
>>> _json
<module '_json' (built-in)>
>>> print _json.__doc__
json speedups
>>> dir(_json)
['__doc__', '__name__', '__package__', 'encode_basestring_ascii', 'scanstring']
>>>
It may vary by platform, but the builtin json module is based on simplejson, not including the C speedups. I’ve found simplejson to be as a fast as python-cjson anyway, so I prefer it since it obviously has the same interface as the builtin.
try:
import simplejson as json
except ImportError:
import json
Seems to me that’s the best idiom for awhile, yielding the performance when available while being forwards-compatible.
Even though _json
is available, I’ve noticed json decoding is very slow on CPython 2.6.6. I haven’t compared with other implementations, but I’ve switched to string manipulation when inside performance-critical loops.
The new Yajl – Yet Another JSON Library is very fast.
yajl serialize: 0.180 deserialize: 0.182 total: 0.362
simplejson serialize: 0.840 deserialize: 0.490 total: 1.331
stdlib json serialize: 2.812 deserialize: 8.725 total: 11.537
You can compare the libraries yourself.
Update: UltraJSON is even faster.
I was parsing the same file 10x. File size was 1,856,944 bytes.
Python 2.6:
yajl serialize: 0.294 deserialize: 0.334 total: 0.627
cjson serialize: 0.494 deserialize: 0.276 total: 0.769
simplejson serialize: 0.554 deserialize: 0.268 total: 0.823
stdlib json serialize: 3.917 deserialize: 17.508 total: 21.425
Python 2.7:
yajl serialize: 0.289 deserialize: 0.312 total: 0.601
cjson serialize: 0.232 deserialize: 0.254 total: 0.486
simplejson serialize: 0.288 deserialize: 0.253 total: 0.540
stdlib json serialize: 0.273 deserialize: 0.256 total: 0.528
Not sure why numbers are disproportionate from your results. I guess, newer libraries?
take a look UltraJSON https://github.com/esnme/ultrajson
here my test (code from: https://gist.github.com/lightcatcher/1136415)
platform: OS X 10.8.3 MBP 2.2 GHz Intel Core i7
JSON:
simplejson==3.1.0
python-cjson==1.0.5
jsonlib==1.6.1
ujson==1.30
yajl==0.3.5
JSON Benchmark
2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)]
-----------------------------
ENCODING
simplejson: 0.293394s
cjson: 0.461517s
ujson: 0.222278s
jsonlib: 0.428641s
json: 0.759091s
yajl: 0.388836s
DECODING
simplejson: 0.556367s
cjson: 0.42649s
ujson: 0.212396s
jsonlib: 0.265861s
json: 0.365553s
yajl: 0.361718s
For those who are parsing output from a request using the requests package, e.g.:
res = requests.request(...)
text = json.loads(res.text)
This can be very slow for larger response contents, say ~45 seconds for 6 MB on my 2017 MacBook. It is not caused by a slow json parser, but instead by a slow character set determination by the res.text call.
You can solve this by setting the character set before you are calling res.text, and using the cchardet package (see also here):
if res.encoding is None:
res.encoding = cchardet.detect(res.content)['encoding']
This makes the response text json parsing almost instant!
I’m using the json
module in Python 2.6 to load and decode JSON files. However I’m currently getting slower than expected performance. I’m using a test case which is 6MB in size and json.loads()
is taking 20 seconds.
I thought the json
module had some native code to speed up the decoding?
How do I check if this is being used?
As a comparison, I downloaded and installed the python-cjson
module, and cjson.decode()
is taking 1 second for the same test case.
I’d rather use the JSON module provided with Python 2.6 so that users of my code aren’t required to install additional modules.
(I’m developing on Mac OS X, but I getting a similar result on Windows XP.)
Looking in my installation of Python 2.6.1 on windows, the json
package loads the _json
module, which is built into the runtime. C
source for the json speedups
module is here.
>>> import _json
>>> _json
<module '_json' (built-in)>
>>> print _json.__doc__
json speedups
>>> dir(_json)
['__doc__', '__name__', '__package__', 'encode_basestring_ascii', 'scanstring']
>>>
It may vary by platform, but the builtin json module is based on simplejson, not including the C speedups. I’ve found simplejson to be as a fast as python-cjson anyway, so I prefer it since it obviously has the same interface as the builtin.
try:
import simplejson as json
except ImportError:
import json
Seems to me that’s the best idiom for awhile, yielding the performance when available while being forwards-compatible.
Even though _json
is available, I’ve noticed json decoding is very slow on CPython 2.6.6. I haven’t compared with other implementations, but I’ve switched to string manipulation when inside performance-critical loops.
The new Yajl – Yet Another JSON Library is very fast.
yajl serialize: 0.180 deserialize: 0.182 total: 0.362
simplejson serialize: 0.840 deserialize: 0.490 total: 1.331
stdlib json serialize: 2.812 deserialize: 8.725 total: 11.537
You can compare the libraries yourself.
Update: UltraJSON is even faster.
I was parsing the same file 10x. File size was 1,856,944 bytes.
Python 2.6:
yajl serialize: 0.294 deserialize: 0.334 total: 0.627
cjson serialize: 0.494 deserialize: 0.276 total: 0.769
simplejson serialize: 0.554 deserialize: 0.268 total: 0.823
stdlib json serialize: 3.917 deserialize: 17.508 total: 21.425
Python 2.7:
yajl serialize: 0.289 deserialize: 0.312 total: 0.601
cjson serialize: 0.232 deserialize: 0.254 total: 0.486
simplejson serialize: 0.288 deserialize: 0.253 total: 0.540
stdlib json serialize: 0.273 deserialize: 0.256 total: 0.528
Not sure why numbers are disproportionate from your results. I guess, newer libraries?
take a look UltraJSON https://github.com/esnme/ultrajson
here my test (code from: https://gist.github.com/lightcatcher/1136415)
platform: OS X 10.8.3 MBP 2.2 GHz Intel Core i7
JSON:
simplejson==3.1.0
python-cjson==1.0.5
jsonlib==1.6.1
ujson==1.30
yajl==0.3.5
JSON Benchmark
2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)]
-----------------------------
ENCODING
simplejson: 0.293394s
cjson: 0.461517s
ujson: 0.222278s
jsonlib: 0.428641s
json: 0.759091s
yajl: 0.388836s
DECODING
simplejson: 0.556367s
cjson: 0.42649s
ujson: 0.212396s
jsonlib: 0.265861s
json: 0.365553s
yajl: 0.361718s
For those who are parsing output from a request using the requests package, e.g.:
res = requests.request(...)
text = json.loads(res.text)
This can be very slow for larger response contents, say ~45 seconds for 6 MB on my 2017 MacBook. It is not caused by a slow json parser, but instead by a slow character set determination by the res.text call.
You can solve this by setting the character set before you are calling res.text, and using the cchardet package (see also here):
if res.encoding is None:
res.encoding = cchardet.detect(res.content)['encoding']
This makes the response text json parsing almost instant!