understanding object_pairs_hook in json.loads()
Question:
In the docs here – https://docs.python.org/3/library/json.html
it says of object_pairs_hook
:
object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders. If object_hook is also defined, the object_pairs_hook takes priority.
There is one rather impressive example of it in this answer.
I don’t understand what a “hook” is or how this feature works. The docs don’t really explain it very clearly. I would like to write one now (otherwise it will be a mess of string methods on the string I am parsing)
Does anyone know of a tutorial on this feature or understand it well enough to explain in detail how it works? They seem to assume in the docs that you know what is going on in the black box of json.loads()
Answers:
It allows you to customize what objects your JSON will parse into. For this specific argument (object_pairs_hook
) it’s for pair (read key/value pairs of a mapping object).
For instance if this string appears in your JSON:
{"var1": "val1", "var2": "val2"}
It will call the function pointed to with the following argument:
[('var1', 'val1'), ('var2', 'val2')]
Whatever the function returns is what will be used in the resulting parsed structure where the above string was.
A trivial example is object_pairs_hook=collections.OrderedDict
which ensures your keys to be ordered the same way as they were they occurred in the incoming string.
The generic idea of a hook is to allow you to register a function that is called (back) as needed for a given task. In this specific case it allows you to customize decoding of (different types of objects in the) incoming JSON string.
This is the only good answer I found when trying to understand the difference between object_pairs_hook
vs object_hook
, so I’ll add this here for others that may be looking for the same info. I wrote a quick test that shows the difference:
import json
json.loads('{"foo": "bar"}', object_pairs_hook=print)
json.loads('{"foo": "bar"}', object_hook=print)
Output
[('foo', 'bar')]
{'foo': 'bar'}
As you can see, the difference is the data type pushed to the hook function:
object_pairs_hook
sends a list of tuples
object_hook
sends a dictionary.
I couldn’t think of a good reason why this is useful, but of course the docs provide. This helps deal with repeated names in the loaded JSON, otherwise one may lose data in the conversion.
Edit: An illustrative example
import json
data = '{"foo": "bar", "foo": "baz"}'
json.loads(data, object_hook=print)
json.loads(data, object_pairs_hook=print)
Output
{'foo': 'baz'}
[('foo', 'bar'), ('foo', 'baz')]
Some exploration I have done is,
I think of the process as below
1.) In case of ‘object_pairs_hook’, the JSON string is parsed into a list of tuples and passed to the callback function in the below example it is ‘print’
2.)In case of ‘object_hook’ the JSON string is passed as a dict
One of the basic use cases is:
If you see weird json, if some duplicate keys exists in original JSON, after parsing you will only get latest ‘key:value’ pair which results in data loss.This behaviour can be overriden by using ‘object_pairs_hook’ see example below
import json
from collections import OrderedDict
json.loads('{"name":"John", "age":30, "city":"New York","name":"james"}', object_pairs_hook=print)
json.loads('{"name":"John", "age":30, "city":"New York","name":"james"}', object_hook=print)
[('name', 'John'), ('age', 30), ('city', 'New York'), ('name', 'james')]
{'name': 'james', 'age': 30, 'city': 'New York'}
In the docs here – https://docs.python.org/3/library/json.html
it says of object_pairs_hook
:
object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders. If object_hook is also defined, the object_pairs_hook takes priority.
There is one rather impressive example of it in this answer.
I don’t understand what a “hook” is or how this feature works. The docs don’t really explain it very clearly. I would like to write one now (otherwise it will be a mess of string methods on the string I am parsing)
Does anyone know of a tutorial on this feature or understand it well enough to explain in detail how it works? They seem to assume in the docs that you know what is going on in the black box of json.loads()
It allows you to customize what objects your JSON will parse into. For this specific argument (object_pairs_hook
) it’s for pair (read key/value pairs of a mapping object).
For instance if this string appears in your JSON:
{"var1": "val1", "var2": "val2"}
It will call the function pointed to with the following argument:
[('var1', 'val1'), ('var2', 'val2')]
Whatever the function returns is what will be used in the resulting parsed structure where the above string was.
A trivial example is object_pairs_hook=collections.OrderedDict
which ensures your keys to be ordered the same way as they were they occurred in the incoming string.
The generic idea of a hook is to allow you to register a function that is called (back) as needed for a given task. In this specific case it allows you to customize decoding of (different types of objects in the) incoming JSON string.
This is the only good answer I found when trying to understand the difference between object_pairs_hook
vs object_hook
, so I’ll add this here for others that may be looking for the same info. I wrote a quick test that shows the difference:
import json
json.loads('{"foo": "bar"}', object_pairs_hook=print)
json.loads('{"foo": "bar"}', object_hook=print)
Output
[('foo', 'bar')]
{'foo': 'bar'}
As you can see, the difference is the data type pushed to the hook function:
object_pairs_hook
sends a list of tuplesobject_hook
sends a dictionary.
I couldn’t think of a good reason why this is useful, but of course the docs provide. This helps deal with repeated names in the loaded JSON, otherwise one may lose data in the conversion.
Edit: An illustrative example
import json
data = '{"foo": "bar", "foo": "baz"}'
json.loads(data, object_hook=print)
json.loads(data, object_pairs_hook=print)
Output
{'foo': 'baz'}
[('foo', 'bar'), ('foo', 'baz')]
Some exploration I have done is,
I think of the process as below
1.) In case of ‘object_pairs_hook’, the JSON string is parsed into a list of tuples and passed to the callback function in the below example it is ‘print’
2.)In case of ‘object_hook’ the JSON string is passed as a dict
One of the basic use cases is:
If you see weird json, if some duplicate keys exists in original JSON, after parsing you will only get latest ‘key:value’ pair which results in data loss.This behaviour can be overriden by using ‘object_pairs_hook’ see example below
import json
from collections import OrderedDict
json.loads('{"name":"John", "age":30, "city":"New York","name":"james"}', object_pairs_hook=print)
json.loads('{"name":"John", "age":30, "city":"New York","name":"james"}', object_hook=print)
[('name', 'John'), ('age', 30), ('city', 'New York'), ('name', 'james')]
{'name': 'james', 'age': 30, 'city': 'New York'}