Python 2.6 JSON decoding performance

Question:

I’m using the json module in Python 2.6 to load and decode JSON files. However I’m currently getting slower than expected performance. I’m using a test case which is 6MB in size and json.loads() is taking 20 seconds.

I thought the json module had some native code to speed up the decoding?

How do I check if this is being used?

As a comparison, I downloaded and installed the python-cjson module, and cjson.decode() is taking 1 second for the same test case.

I’d rather use the JSON module provided with Python 2.6 so that users of my code aren’t required to install additional modules.

(I’m developing on Mac OS X, but I getting a similar result on Windows XP.)

Asked By: James Austin

||

Answers:

Looking in my installation of Python 2.6.1 on windows, the json package loads the _json module, which is built into the runtime. C source for the json speedups module is here.

>>> import _json
>>> _json
<module '_json' (built-in)>
>>> print _json.__doc__
json speedups
>>> dir(_json)
['__doc__', '__name__', '__package__', 'encode_basestring_ascii', 'scanstring']
>>> 
Answered By: gimel

It may vary by platform, but the builtin json module is based on simplejson, not including the C speedups. I’ve found simplejson to be as a fast as python-cjson anyway, so I prefer it since it obviously has the same interface as the builtin.

try:
    import simplejson as json
except ImportError:
    import json

Seems to me that’s the best idiom for awhile, yielding the performance when available while being forwards-compatible.

Answered By: A. Coady

Even though _json is available, I’ve noticed json decoding is very slow on CPython 2.6.6. I haven’t compared with other implementations, but I’ve switched to string manipulation when inside performance-critical loops.

Answered By: Tobu

The new Yajl – Yet Another JSON Library is very fast.

yajl        serialize: 0.180  deserialize: 0.182  total: 0.362
simplejson  serialize: 0.840  deserialize: 0.490  total: 1.331
stdlib json serialize: 2.812  deserialize: 8.725  total: 11.537

You can compare the libraries yourself.

Update: UltraJSON is even faster.

Answered By: Ivo Danihelka

I was parsing the same file 10x. File size was 1,856,944 bytes.

Python 2.6:

yajl        serialize: 0.294  deserialize: 0.334  total: 0.627
cjson       serialize: 0.494  deserialize: 0.276  total: 0.769
simplejson  serialize: 0.554  deserialize: 0.268  total: 0.823
stdlib json serialize: 3.917  deserialize: 17.508 total: 21.425

Python 2.7:

yajl        serialize: 0.289  deserialize: 0.312  total: 0.601
cjson       serialize: 0.232  deserialize: 0.254  total: 0.486
simplejson  serialize: 0.288  deserialize: 0.253  total: 0.540
stdlib json serialize: 0.273  deserialize: 0.256  total: 0.528

Not sure why numbers are disproportionate from your results. I guess, newer libraries?

Answered By: Tomas

take a look UltraJSON https://github.com/esnme/ultrajson

here my test (code from: https://gist.github.com/lightcatcher/1136415)

platform: OS X 10.8.3 MBP 2.2 GHz Intel Core i7

JSON:

simplejson==3.1.0

python-cjson==1.0.5

jsonlib==1.6.1

ujson==1.30

yajl==0.3.5

JSON Benchmark
2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)]
-----------------------------
ENCODING
simplejson: 0.293394s
cjson: 0.461517s
ujson: 0.222278s
jsonlib: 0.428641s
json: 0.759091s
yajl: 0.388836s

DECODING
simplejson: 0.556367s
cjson: 0.42649s
ujson: 0.212396s
jsonlib: 0.265861s
json: 0.365553s
yajl: 0.361718s
Answered By: TONy.W

For those who are parsing output from a request using the requests package, e.g.:

res = requests.request(...)

text = json.loads(res.text)

This can be very slow for larger response contents, say ~45 seconds for 6 MB on my 2017 MacBook. It is not caused by a slow json parser, but instead by a slow character set determination by the res.text call.

You can solve this by setting the character set before you are calling res.text, and using the cchardet package (see also here):

if res.encoding is None:
    res.encoding = cchardet.detect(res.content)['encoding']

This makes the response text json parsing almost instant!

Answered By: ganzpopp
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.