Why should json.loads be preferred to ast.literal_eval for parsing JSON?

Question:

I have a dictionary that is stored in a db field as a string. I am trying to parse it into a dict, but json.loads gives me an error.

Why does json.loads fail on this and ast.literal_eval works? Is one preferable over the other?

>>> c.iframe_data
u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}"

# json fails
>>> json.loads(c.iframe_data)
Traceback (most recent call last):
ValueError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

# ast.literal_eval works
>>> ast.literal_eval(c.iframe_data)
{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
Asked By: David542

||

Answers:

json.loads failed because your c.iframe_data value is not a valid JSON document. In valid json document string are quoted in double quote and there isn’t anything like u for converting strings to unicode.

Using json.loads(c.iframe_data) means deserialize the JSON document in c.iframe_data

ast.literal_eval is used whenever you need eval to evaluate input expression. If you have Python expressions as an input that you want to evaluate.

Is one preferable over the other?

It depends on the data. See this answer for more context.

Answered By: styvane

Because that u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}" is a Python unicode string, not a Javascript Object Notation , in chrome console:

bad = {u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
SyntaxError: Unexpected string
good = {'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
Object {person: "Annabelle!", csrfmiddlewaretoken: "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}

Or you can use yaml to deal with it:

>>> a = '{"person": "Annabelle!", "csrfmiddlewaretoken": "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> json.loads(a)
{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
>>> import ast
>>> ast.literal_eval(a)
{'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
>>> import yaml
>>> a = '{u"person": u"Annabelle!", u"csrfmiddlewaretoken": u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> yaml.load(a)
{'u"person"': 'u"Annabelle!"', 'u"csrfmiddlewaretoken"': 'u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"'}
>>> a = u'{u"person": u"Annabelle!", u"csrfmiddlewaretoken": u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> yaml.load(a)
{'u"person"': 'u"Annabelle!"', 'u"csrfmiddlewaretoken"': 'u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"'}
Answered By: lqhcpsgbl

json.loads is used specifically to parse JSON which is quite a restrictive format. There is no u'...' syntax and all strings are delimited by double quotes, not single quotes. Use json.dumps to serialise something that can be read by json.loads.

So json.loads(string) is the inverse of json.dumps(object) whereas ast.literal_eval(string) is (vaguely) the inverse of repr(object).

JSON is nice because it’s portable — there are parsers for it trivially available in pretty much every language. So if you want to send JSON to a Javascript frontend you’ll have no issues.

ast.literal_eval isn’t easily portable but it’s slightly richer: you can use tuples, sets, and dicts whose keys aren’t restricted to strings, for example.

Also json.loads is significantly faster than ast.literal_eval.

Answered By: Andrew Magee

I have a dictionary that is stored in a db field as a string.

This is a design fault. While it’s perfectly possible, as someone appears to have done, to extract the repr of a dictionary, there’s no guarantee that the repr of an object can be evaluated at all.

In the presence of only string keys and string and numeric values, most times the Python eval function will reproduce the value from its repr, but I am unsure why you think that this would make it valid JSON, for example.

I am trying to parse it into a dict, but json.loads gives me an error.

Naturally. You aren’t storing JSON in the database, so it hardly seems reasonable to expect it to parse as JSON. While it’s interesting that ast.literal_eval can be used to parse the value, again there are no guarantees beyond relatively simple Python types.

Since it appears your data is indeed limited to such types, the real solution to your problem is to correct the way the data is stored, by converting the dictionary to a string with json.dumps before storage in the database. Some database systems (e.g., PostgreSQL) have JSON types to make querying such data simpler, and I’d recommend you use such types if they are available to you.

As to which is “better,” that will always depend on the specific application, but JSON was explicitly designed as a compact human-readable machine-parseable format for simple structured data, whereas your current representation is based on formats specific to Python, which (for example) would be tediously difficult to evaluate in other languages. JSON is the applicable standard here, and you will benefit from using it.

Answered By: holdenweb

First, and most importantly, do not serialize data twice. Your database is itself a serialization of data, with a rich and expressive set of tools to query, explore, manipulate, and present it. Serializing data to be subsequently placed in a database eliminates the possibility for isolated sub-component updates, sub-component querying & indexing, and couples all writes to mandatory initial reads, for a few of the most significant issues.

Next, Java Script Object Notation (JSON) is a limited subset of the JavaScript language suitable for the representation of static data in service of data interchange. As a subset of the language, this means you can naively eval it within JS to reconstruct the original object. It is a simple serialization (no advanced features such as internal references, template definition, type extension) with the limitations of the JavaScript language baked in and penalties for the use of strings requiring large amounts of “escaping”. The use of end markers also makes it difficult to utilize in purely streaming scenarios, e.g. you can’t “finalize” an object until hitting its paired }, and as such it also has no marker for record separation. Notable examples of other limitations include delivering HTML within JSON requiring excessive escaping, all numbers are floating point (54-bit integer accuracy, rounding errors, …) making it patently unsuitable for the storage or transfer of financial information or use of technologies (e.g. crypto) requiring 64-bit integers, no native date representation, …

There are some significant differences between JS and Python as languages, and thus in how JSON “JavaScript Object Notation” vs. PLS (Python Literal Syntax) behave. It just so happens that for the purpose of literal definition, most of JavaScript literal syntax is directly compatible with Python, albeit with slightly differing interpretations. The reverse is not true, see the above examples of disparity. If you care about preserving the fidelity of your data for Python, Python literals are more expressive and less “lossy” than their JS equivalents. However, as other answers/comments have noted, repr() is not a reliable way to generate this representation; Python literal syntax is not meant to be used this way. For the greatest type fidelity I generally recommend YAML serialization, of which JSON is a fully valid subset.

FYI, to address the practical concern of storage of dictionary-like mappings associated with entities, there are entity-attribute-value data models. Arbitrary key-value stores in relational databases FTW, but with power comes responsibility. Use this pattern carefully and only when absolutely needed. (If this is a frequent pattern, look into document stores.)

Answered By: amcgregor

json.loads should strongly be preferred to ast.literal_eval for parsing JSON, for all the reasons below (summarizing other posters).

In your specific example, your input was illegal/malformed JSON exported the wrong way using Python 2.x (all the unwanted and illegal u' prefixes), anyway Python 2.x is itself near-EOL, please move to 3.x. You can simply use a regex to fixup/preprocess that:

>>> import json
>>> import re
>>> malformed_json = u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}"

>>> legal_json = re.sub(r'u'([^']*)'', r'"1"', malformed_json)
'{"person": "Annabelle!", "csrfmiddlewaretoken": "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'

>>> json.loads(legal_json)
{'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
  • (Note: if your architecture has lots of malformed JSON strings exported the wrong way from 2.x, stored in a DB, that’s not a legit reason not to use json.loads, but it is to revisit your architecture. At least just run the fixup regex on all your strings, once, and store the legal JSON back))

json.loads Pros/Cons:

  • handles all legal JSON, unlike ast.literal_eval

  • slow. There are much faster JSON libraries like ultrajson, yajl, simplejson etc. Also, on large import jobs you can use multiprocessing/multithreading (which also gives you protection from memory leaks, which is a common issue with all parsers).

  • numerical fields: converts all integers, long integers and floats to double, may lose precision (@amcgregor)

Answered By: smci

In my case, I skipped ast.literal_eval(selected_cell_from_db) which was not making a JSON dict from single quoted dict (even though it could evaluate it, it was type string) that I could use in the package DeepDiff.

I needed to use
json.dumps(obj_to_save_to_db)
when saving the object instead of str(obj_to_save_to_db).
json.dumps creates a double quoted, json readable string; str() does not),

then json.loads(selected_cell_from_db).

Answered By: Michael Wegter