How to parse somewhat wrong JSON with Python?

Question:

I have the following JSON string coming from external input source:

{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}

This is an incorrectly-formatted JSON string ("id" and "value" must be in quotes), but I need to parse it anyway. I have tried simplejson and json-py and seems they could not be set up to parse such strings.

I am running Python 2.5 on Google App engine, so any C-based solutions like python-cjson are not applicable.

Input format could be changed to XML or YAML, in addition to JSON listed above, but I am using JSON within the project and changing format in specific place would not be very good.

Now I’ve switched to XML and parsing the data successfully, but looking forward to any solution that would allow me to switch back to JSON.

Asked By: Serge Tarkovski

||

Answers:

You could use a string parser to fix it first, a regex could do it provided that this is as complicated as the JSON will get.

Answered By: davidosomething

since YAML (>=1.2) is a superset of JSON, you can do:

>>> import yaml
>>> s = '{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}'
>>> yaml.load(s)
{'id': 17893, 'value': '82363549923gnyh49c9djl239pjm01223'}
Answered By: mykhal

Pyparsing includes a JSON parser example, here is the online source. You could modify the definition of memberDef to allow a non-quoted string for the member name, and then you could use this to parser your not-quite-JSON source text.

[The August, 2008 issue of Python Magazine has a lot more detailed info about this parser. It shows some sample JSON, and code that accesses the parsed results like it was a deserialized object.

Answered By: PaulMcG

You can use demjson.

>>> import demjson
>>> demjson.decode('{foo:3}')
{u'foo': 3}
Answered By: null

The dirtyjson library can handle some almost-correct JSON:

>>> import dirtyjson
>>> 
>>> s = '{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}'
>>> d = dirtyjson.loads(s)
>>> d
AttributedDict([('value', '82363549923gnyh49c9djl239pjm01223'), ('id', 17893)])
>>>
>>> d = dict(d)
>>> d
{'value': '82363549923gnyh49c9djl239pjm01223', 'id': 17893}
>>> d["value"]
'82363549923gnyh49c9djl239pjm01223'
>>> d["id"]
17893
Answered By: Gino Mempin
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.