Running JSON through Python's eval()?

Question:

DO NOT DO THIS.

This question is still getting upvotes, so I wanted to add a warning to it. If you’re using Python 3, just use the included json package. If you’re using Python 2, do everything you can to move to Python 3. If you’re prevented from using Python 3 (my condolences), use the simplejson package suggested by James Thompson.

Original question follows.


Best practices aside, is there a compelling reason not to do this?

I’m writing a post-commit hook for use with a Google Code project, which provides commit data via a JSON object. GC provides an HMAC authentication token along with the request (outside the JSON data), so by validating that token I gain high confidence that the JSON data is both benign (as there’s little point in distrusting Google) and valid.

My own (brief) investigations suggest that JSON happens to be completely valid Python, with the exception of the "/" escape sequence — which GC doesn’t appear to generate.

So, as I’m working with Python 2.4 (i.e. no json module), eval() is looking really tempting.

Edit: For the record, I am very much not asking if this is a good idea. I’m quite aware that it isn’t, and I very much doubt I’ll ever use this technique for any future projects even if I end up using it for this one. I just wanted to make sure that I know what kind of trouble I’ll run into if I do. 🙂

Asked By: Ben Blank

||

Answers:

The point of best practices is that in most cases, it’s a bad idea to disregard them. If I were you, I’d use a parser to parse JSON into Python. Try out simplejson, it was very straightforward for parsing JSON when I last tried it and it claims to be compatible with Python 2.4.

I disagree that there’s little point in distrusting Google. I wouldn’t distrust them, but I’d verify the data you get from them. The reason that I’d actually use a JSON parser is right in your question:

My own (brief) investigations suggest that JSON happens to be completely valid Python, with the exception of the “/” escape sequence — which GC doesn’t appear to generate.

What makes you think that Google Code will never generate an escape sequence like that?

Parsing is a solved problem if you use the right tools. If you try to take shortcuts like this, you’ll eventually get bitten by incorrect assumptions, or you’ll do something like trying to hack together a parser with regex’s and boolean logic when a parser already exists for your language of choice.

Answered By: James Thompson

If you’re comfortable with your script working fine for a while, and then randomly failing on some obscure edge case, I would go with eval.

If it’s important that your code be robust, I would take the time to add simplejson. You don’t need the C portion for speedups, so it really shouldn’t be hard to dump a few .py files into a directory somewhere.

As an example of something that might bite you, JSON uses Unicode and simplejson returns Unicode, whereas eval returns str:

>>> simplejson.loads('{"a":1, "b":2}')
{u'a': 1, u'b': 2}
>>> eval('{"a":1, "b":2}')
{'a': 1, 'b': 2}

Edit: a better example of where eval() behaves differently:

>>> simplejson.loads('{"X": "uabcd"}')
{u'X': u'uabcd'}
>>> eval('{"X": "uabcd"}')
{'X': '\uabcd'}
>>> simplejson.loads('{"X": "uabcd"}') == eval('{"X": "uabcd"}')
False

Edit 2: saw yet another problem today pointed out by SilentGhost: eval doesn’t handle true -> True, false -> False, null -> None correctly.

>>> simplejson.loads('[false, true, null]')
[False, True, None]
>>> eval('[false, true, null]')
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'false' is not defined
>>> 
Answered By: Kiv

evaling JSON is a bit like trying to run XML through a C++ compiler.

eval is meant to evaluate Python code. Although there are some syntactical similarities, JSON isn’t Python code. Heck, not only is it not Python code, it’s not code to begin with. Therefore, even if you can get away with it for your use-case, I’d argue that it’s a bad idea conceptually. Python is an apple, JSON is orange-flavored soda.

Answered By: Jason Baker

One major difference is that a boolean in JSON is true|false, but Python uses True|False.

The most important reason not to do this can be generalized: eval should never be used to interpret external input since this allows for arbitrary code execution.

Answered By: eradman
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.