understanding object_pairs_hook in json.loads()

Question:

In the docs here – https://docs.python.org/3/library/json.html

it says of object_pairs_hook:

object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders. If object_hook is also defined, the object_pairs_hook takes priority.

There is one rather impressive example of it in this answer.

I don’t understand what a “hook” is or how this feature works. The docs don’t really explain it very clearly. I would like to write one now (otherwise it will be a mess of string methods on the string I am parsing)

Does anyone know of a tutorial on this feature or understand it well enough to explain in detail how it works? They seem to assume in the docs that you know what is going on in the black box of json.loads()

Asked By: cardamom

||

Answers:

It allows you to customize what objects your JSON will parse into. For this specific argument (object_pairs_hook) it’s for pair (read key/value pairs of a mapping object).

For instance if this string appears in your JSON:

{"var1": "val1", "var2": "val2"}

It will call the function pointed to with the following argument:

[('var1', 'val1'), ('var2', 'val2')]

Whatever the function returns is what will be used in the resulting parsed structure where the above string was.

A trivial example is object_pairs_hook=collections.OrderedDict which ensures your keys to be ordered the same way as they were they occurred in the incoming string.

The generic idea of a hook is to allow you to register a function that is called (back) as needed for a given task. In this specific case it allows you to customize decoding of (different types of objects in the) incoming JSON string.

Answered By: Ondrej K.

This is the only good answer I found when trying to understand the difference between object_pairs_hook vs object_hook, so I’ll add this here for others that may be looking for the same info. I wrote a quick test that shows the difference:

import json

json.loads('{"foo": "bar"}', object_pairs_hook=print)
json.loads('{"foo": "bar"}', object_hook=print)

Output

[('foo', 'bar')]
{'foo': 'bar'}

As you can see, the difference is the data type pushed to the hook function:

  • object_pairs_hook sends a list of tuples
  • object_hook sends a dictionary.

I couldn’t think of a good reason why this is useful, but of course the docs provide. This helps deal with repeated names in the loaded JSON, otherwise one may lose data in the conversion.


Edit: An illustrative example

import json

data = '{"foo": "bar", "foo": "baz"}'

json.loads(data, object_hook=print)
json.loads(data, object_pairs_hook=print)

Output

{'foo': 'baz'}
[('foo', 'bar'), ('foo', 'baz')]
Answered By: Sam Morgan

Some exploration I have done is,
I think of the process as below
1.) In case of ‘object_pairs_hook’, the JSON string is parsed into a list of tuples and passed to the callback function in the below example it is ‘print’
2.)In case of ‘object_hook’ the JSON string is passed as a dict

One of the basic use cases is:
If you see weird json, if some duplicate keys exists in original JSON, after parsing you will only get latest ‘key:value’ pair which results in data loss.This behaviour can be overriden by using ‘object_pairs_hook’ see example below

import json
from collections import OrderedDict
json.loads('{"name":"John", "age":30, "city":"New York","name":"james"}', object_pairs_hook=print)
json.loads('{"name":"John", "age":30, "city":"New York","name":"james"}', object_hook=print)

[('name', 'John'), ('age', 30), ('city', 'New York'), ('name', 'james')]
{'name': 'james', 'age': 30, 'city': 'New York'}
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.