How to serialize Python objects in a human-readable format?

Question:

I need to store Python structures made of lists / dictionaries, tuples into a human-readable format. The idea is like using something similar to pickle, but pickle is not human-friendly. Other options that come to my mind are YAML (through PyYAML and JSON (through simplejson) serializers.

Any other option that comes to your mind?

Asked By: pistacchio

||

Answers:

For simple cases pprint() and eval() come to mind.

Using your example:

>>> d = {'age': 27,
...  'name': 'Joe',
...  'numbers': [1, 
...              2, 
...              3,
...              4,
...              5],
...  'subdict': {
...              'first': 1, 
...              'second': 2,
...               'third': 3
...              }
... }
>>> 
>>> from pprint import pprint
>>> pprint(d)
{'age': 27,
 'name': 'Joe',
 'numbers': [1, 2, 3, 4, 5],
 'subdict': {'first': 1, 'second': 2, 'third': 3}}
>>> 

I would think twice about fixing two requirements with the same tool. Have you considered using pickle for the serializing and then pprint() (or a more fancy object viewer) for humans looking at the objects?

Answered By: PEZ

If its just Python list, dictionary and tuple object. – JSON is the way to go. Its human readable, very easy to handle and language independent too.

Caution: Tuples will be converted to lists in simplejson.

In [109]: simplejson.loads(simplejson.dumps({'d':(12,3,4,4,5)}))
Out[109]: {u'd': [12, 3, 4, 4, 5]}
Answered By: JV.

To use simplejson first easy_install simplejson:

import simplejson
my_structure = {"name":"Joe", "age":27, "numbers":[1,2,3,4,5], "subdict":{"first":1, "second":2, "third": 3}}
json = simplejson.dumps(my_structure)

results in json being:

{"age": 27, "subdict": {"second": 2, "third": 3, "first": 1}, "name": "Joe", "numbers": [1, 2, 3, 4, 5]}

Notice that its hardly changed the format of the dictionary at all, but you should run it through this step to ensure valid JSON data.

You can further pretty print the result:

import pprint
pprint.pprint(my_structure)

results in:

{'age': 27,
 'name': 'Joe',
 'numbers': [1, 2, 3, 4, 5],
 'subdict': {'first': 1, 'second': 2, 'third': 3}}
Answered By: Soviut

If you’re after more representations than are covered by JSON, I highly recommend checking out PyON (Python Object Notation)…although I believe it’s restricted to 2.6/3.0 and above, as it relies on the ast module. It handles custom class instances and recursive data types, amongst other features, which is more than is provided by JSON.

Answered By: Matthew Trevor

You should check out jsonpickle (https://github.com/jsonpickle/jsonpickle). It will write out any python object into a json file. You can then read that file back into a python object. The nice thing is the inbetween file is very readable because it’s json.

Answered By: Paul Hildebrandt

There is AXON (textual) format that combine the best of JSON, XML and YAML.
AXON format is quite readable and relatively compact.

The python (2.7/3.3-3.7) module pyaxon supports load(s)/dump(s) functionality, including iterative loading/dumping. It’s sufficiently fast in order to be useful.

Consider simple example:

>>> d = {
     'age': 27, 'name': 'Joe', 
     'numbers': [1, 2, 3, 4, 5], 
     'subdict': {'first': 1, 'second': 2, 'third': 3}
    }
# pretty form
>>> axon.dumps(d, pretty=1)
{ age: 27
  name: "Joe"
  numbers: [1 2 3 4 5]
  subdict: {
    first: 1
    second: 2
    third: 3}}
# compact form
>>> axon.dumps(d)
{age:27 name:"Joe" numbers:[1 2 3 4 5] subdict:{first:1 second:2 third:3}}

It also can handle multiple objects in the message:

>>> msg = axon.dumps([{'a':1, 'b':2, 'c':3}, {'a':2, 'b':3, 'c':4}])
>>> print(msg)
{a:1 b:2 c:3} 
{a:2 b:3 c:4}
{a:3 b:4 c:5}

and then load them iteratively:

for d in axon.iloads(msg):
   print(d)
Answered By: intellimath

What do you mean this is not human-readable??? 😉

>>> d = {'age': 27, 
...   'name': 'Joe',
...   'numbers': [1,2,3,4,5],
...   'subdict': {'first':1, 'second':2, 'third':3}
... }
>>> 
>>> import pickle
>>> p = pickle.dumps(d)      
>>> p
"(dp0nS'age'np1nI27nsS'subdict'np2n(dp3nS'second'np4nI2nsS'third'np5nI3nsS'first'np6nI1nssS'name'np7nS'Joe'np8nsS'numbers'np9n(lp10nI1naI2naI3naI4naI5nas."

Ok, well, maybe it just takes some practice… or you could cheat…

>>> import pickletools 
>>> pickletools.dis(p)
    0: (    MARK
    1: d        DICT       (MARK at 0)
    2: p    PUT        0
    5: S    STRING     'age'
   12: p    PUT        1
   15: I    INT        27
   19: s    SETITEM
   20: S    STRING     'subdict'
   31: p    PUT        2
   34: (    MARK
   35: d        DICT       (MARK at 34)
   36: p    PUT        3
   39: S    STRING     'second'
   49: p    PUT        4
   52: I    INT        2
   55: s    SETITEM
   56: S    STRING     'third'
   65: p    PUT        5
   68: I    INT        3
   71: s    SETITEM
   72: S    STRING     'first'
   81: p    PUT        6
   84: I    INT        1
   87: s    SETITEM
   88: s    SETITEM
   89: S    STRING     'name'
   97: p    PUT        7
  100: S    STRING     'Joe'
  107: p    PUT        8
  110: s    SETITEM
  111: S    STRING     'numbers'
  122: p    PUT        9
  125: (    MARK
  126: l        LIST       (MARK at 125)
  127: p    PUT        10
  131: I    INT        1
  134: a    APPEND
  135: I    INT        2
  138: a    APPEND
  139: I    INT        3
  142: a    APPEND
  143: I    INT        4
  146: a    APPEND
  147: I    INT        5
  150: a    APPEND
  151: s    SETITEM
  152: .    STOP
highest protocol among opcodes = 0
>>> 

You’d still have to read the pickled object from a file, however you wouldn’t need to load it. So, if it’s a “dangerous” object, you still might be able to figure that out before doing the load. If you are stuck with a pickle, it might be a good option for deciphering what you have.

Answered By: Mike McKerns
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.