How to dump a Python dictionary to JSON when keys are non-trivial objects?
Question:
import datetime, json
x = {'alpha': {datetime.date.today(): 'abcde'}}
print json.dumps(x)
The above code fails with a TypeError
since keys of JSON objects need to be strings. The json.dumps
function has a parameter called default that is called when the value of a JSON object raises a TypeError
, but there seems to be no way to do this for the key. What is the most elegant way to work around this?
Answers:
You can extend json.JSONEncoder to create your own encoder which will be able to deal with datetime.datetime objects (or objects of any type you desire) in such a way that a string is created which can be reproduced as a new datetime.datetime instance. I believe it should be as simple as having json.JSONEncoder call repr() on your datetime.datetime instances.
The procedure on how to do so is described in the json module docs.
The json module checks the type of each value it needs to encode and by default it only knows how to handle dicts, lists, tuples, strs, unicode objects, int, long, float, boolean and none 🙂
Also of importance for you might be the skipkeys argument to the JSONEncoder.
After reading your comments I have concluded that there is no easy solution to have JSONEncoder encode the keys of dictionaries with a custom function. If you are interested you can look at the source and the methods iterencode() which calls _iterencode() which calls _iterencode_dict() which is where the type error gets raised.
Easiest for you would be to create a new dict with isoformatted keys like this:
import datetime, json
D = {datetime.datetime.now(): 'foo',
datetime.datetime.now(): 'bar'}
new_D = {}
for k,v in D.iteritems():
new_D[k.isoformat()] = v
json.dumps(new_D)
Which returns ‘{“2010-09-15T23:24:36.169710”: “foo”, “2010-09-15T23:24:36.169723”: “bar”}’. For niceties, wrap it in a function 🙂
http://jsonpickle.github.io/ might be what you want. When facing a similar issue, I ended up doing:
to_save = jsonpickle.encode(THE_THING, unpicklable=False, max_depth=4, make_refs=False)
you can do
x = {'alpha': {datetime.date.today().strftime('%d-%m-%Y'): 'abcde'}}
If you really need to do it, you can monkeypatch json.encoder:
from _json import encode_basestring_ascii # used when ensure_ascii=True (which is the default where you want everything to be ascii)
from _json import encode_basestring # used in any other case
def _patched_encode_basestring(o):
"""
Monkey-patching Python's json serializer so it can serialize keys that are not string!
You can monkey patch the ascii one the same way.
"""
if isinstance(o, MyClass):
return my_serialize(o)
return encode_basestring(o)
json.encoder.encode_basestring = _patched_encode_basestring
JSON only accepts the here mentioned data types for encoding. As @supakeen mentioned, you can extend the JSONEncoder class in order to encode any values inside a dictionary but no keys! If you want to encode keys, you have to do it on your own.
I used a recursive function in order to encode tuple-keys as strings and recover them later.
Here an example:
def _tuple_to_string(obj: Any) -> Any:
"""Serialize tuple-keys to string representation. A tuple wil be obtain a leading '__tuple__' string and decomposed in list representation.
Args:
obj (Any): Typically a dict, tuple, list, int, or string.
Returns:
Any: Input object with serialized tuples.
"""
# deep copy object to avoid manipulation during iteration
obj_copy = copy.deepcopy(obj)
# if the object is a dictionary
if isinstance(obj, dict):
# iterate over every key
for key in obj:
# set for later to avoid modification in later iterations when this var does not get overwritten
serialized_key = None
# if key is tuple
if isinstance(key, tuple):
# stringify the key
serialized_key = f"__tuple__{list(key)}"
# replace old key with encoded key
obj_copy[serialized_key] = obj_copy.pop(key)
# if the key was modified
if serialized_key is not None:
# do it again for the next nested dictionary
obj_copy[serialized_key] = _tuple_to_string(obj[key])
# else, just do it for the next dictionary
else:
obj_copy[key] = _tuple_to_string(obj[key])
return obj_copy
This will turn a tuple of the form ("blah", "blub")
to "__tuple__["blah", "blub"]"
so that you can dump it using json.dumps()
or json.dump()
. You can use the leading "__tuple"__
to detect them during decoding. Therefore, I used this function:
def _string_to_tuple(obj: Any) -> Any:
"""Convert serialized tuples back to original representation. Tuples need to have a leading "__tuple__" string.
Args:
obj (Any): Typically a dict, tuple, list, int, or string.
Returns:
Any: Input object with recovered tuples.
"""
# deep copy object to avoid manipulation during iteration
obj_copy = copy.deepcopy(obj)
# if the object is a dictionary
if isinstance(obj, dict):
# iterate over every key
for key in obj:
# set for later to avoid modification in later iterations when this var does not get overwritten
serialized_key = None
# if key is a serialized tuple starting with the "__tuple__" affix
if isinstance(key, str) and key.startswith("__tuple__"):
# decode it so tuple
serialized_key = tuple(key.split("__tuple__")[1].strip("[]").replace("'", "").split(", "))
# if key is number in string representation
if all(entry.isdigit() for entry in serialized_key):
# convert to integer
serialized_key = tuple(map(int, serialized_key))
# replace old key with encoded key
obj_copy[serialized_key] = obj_copy.pop(key)
# if the key was modified
if serialized_key is not None:
# do it again for the next nested dictionary
obj_copy[serialized_key] = _string_to_tuple(obj[key])
# else, just do it for the next dictionary
else:
obj_copy[key] = _string_to_tuple(obj[key])
# if another instance was found
elif isinstance(obj, list):
for item in obj:
_string_to_tuple(item)
return obj_copy
Insert you custom logic for en-/decoding your instance by changing the
if isinstance(key, tuple):
# stringify the key
serialized_key = f"__tuple__{list(key)}"
in the _tuple_to_string
function or the corresponding code block from the _string_to_tuple
function, respectively:
if isinstance(key, str) and key.startswith("__tuple__"):
# decode it so tuple
serialized_key = tuple(key.split("__tuple__")[1].strip("[]").replace("'", "").split(", "))
# if key is number in string representation
if all(entry.isdigit() for entry in serialized_key):
# convert to integer
serialized_key = tuple(map(int, serialized_key))
Then, you can use it as usual:
>>> dct = {("L1", "L1"): {("L2", "L2"): "foo"}}
>>> json.dumps(_tuple_to_string(dct))
... {"__tuple__['L1', 'L2']": {"__tuple__['L2', 'L2']": "foo"}}
Hope, I could help you!
This something that CAN NOT BE DONE. That is, the default function in json or alternatively extending the JsonEncoder
approach will not work. See this issue:
https://github.com/python/cpython/issues/63020
The reason being that the developers thing that supporting anything other than strings for serialization should be disavowed.
See also:
json.dump not calling default or cls
import datetime, json
x = {'alpha': {datetime.date.today(): 'abcde'}}
print json.dumps(x)
The above code fails with a TypeError
since keys of JSON objects need to be strings. The json.dumps
function has a parameter called default that is called when the value of a JSON object raises a TypeError
, but there seems to be no way to do this for the key. What is the most elegant way to work around this?
You can extend json.JSONEncoder to create your own encoder which will be able to deal with datetime.datetime objects (or objects of any type you desire) in such a way that a string is created which can be reproduced as a new datetime.datetime instance. I believe it should be as simple as having json.JSONEncoder call repr() on your datetime.datetime instances.
The procedure on how to do so is described in the json module docs.
The json module checks the type of each value it needs to encode and by default it only knows how to handle dicts, lists, tuples, strs, unicode objects, int, long, float, boolean and none 🙂
Also of importance for you might be the skipkeys argument to the JSONEncoder.
After reading your comments I have concluded that there is no easy solution to have JSONEncoder encode the keys of dictionaries with a custom function. If you are interested you can look at the source and the methods iterencode() which calls _iterencode() which calls _iterencode_dict() which is where the type error gets raised.
Easiest for you would be to create a new dict with isoformatted keys like this:
import datetime, json
D = {datetime.datetime.now(): 'foo',
datetime.datetime.now(): 'bar'}
new_D = {}
for k,v in D.iteritems():
new_D[k.isoformat()] = v
json.dumps(new_D)
Which returns ‘{“2010-09-15T23:24:36.169710”: “foo”, “2010-09-15T23:24:36.169723”: “bar”}’. For niceties, wrap it in a function 🙂
http://jsonpickle.github.io/ might be what you want. When facing a similar issue, I ended up doing:
to_save = jsonpickle.encode(THE_THING, unpicklable=False, max_depth=4, make_refs=False)
you can do
x = {'alpha': {datetime.date.today().strftime('%d-%m-%Y'): 'abcde'}}
If you really need to do it, you can monkeypatch json.encoder:
from _json import encode_basestring_ascii # used when ensure_ascii=True (which is the default where you want everything to be ascii)
from _json import encode_basestring # used in any other case
def _patched_encode_basestring(o):
"""
Monkey-patching Python's json serializer so it can serialize keys that are not string!
You can monkey patch the ascii one the same way.
"""
if isinstance(o, MyClass):
return my_serialize(o)
return encode_basestring(o)
json.encoder.encode_basestring = _patched_encode_basestring
JSON only accepts the here mentioned data types for encoding. As @supakeen mentioned, you can extend the JSONEncoder class in order to encode any values inside a dictionary but no keys! If you want to encode keys, you have to do it on your own.
I used a recursive function in order to encode tuple-keys as strings and recover them later.
Here an example:
def _tuple_to_string(obj: Any) -> Any:
"""Serialize tuple-keys to string representation. A tuple wil be obtain a leading '__tuple__' string and decomposed in list representation.
Args:
obj (Any): Typically a dict, tuple, list, int, or string.
Returns:
Any: Input object with serialized tuples.
"""
# deep copy object to avoid manipulation during iteration
obj_copy = copy.deepcopy(obj)
# if the object is a dictionary
if isinstance(obj, dict):
# iterate over every key
for key in obj:
# set for later to avoid modification in later iterations when this var does not get overwritten
serialized_key = None
# if key is tuple
if isinstance(key, tuple):
# stringify the key
serialized_key = f"__tuple__{list(key)}"
# replace old key with encoded key
obj_copy[serialized_key] = obj_copy.pop(key)
# if the key was modified
if serialized_key is not None:
# do it again for the next nested dictionary
obj_copy[serialized_key] = _tuple_to_string(obj[key])
# else, just do it for the next dictionary
else:
obj_copy[key] = _tuple_to_string(obj[key])
return obj_copy
This will turn a tuple of the form ("blah", "blub")
to "__tuple__["blah", "blub"]"
so that you can dump it using json.dumps()
or json.dump()
. You can use the leading "__tuple"__
to detect them during decoding. Therefore, I used this function:
def _string_to_tuple(obj: Any) -> Any:
"""Convert serialized tuples back to original representation. Tuples need to have a leading "__tuple__" string.
Args:
obj (Any): Typically a dict, tuple, list, int, or string.
Returns:
Any: Input object with recovered tuples.
"""
# deep copy object to avoid manipulation during iteration
obj_copy = copy.deepcopy(obj)
# if the object is a dictionary
if isinstance(obj, dict):
# iterate over every key
for key in obj:
# set for later to avoid modification in later iterations when this var does not get overwritten
serialized_key = None
# if key is a serialized tuple starting with the "__tuple__" affix
if isinstance(key, str) and key.startswith("__tuple__"):
# decode it so tuple
serialized_key = tuple(key.split("__tuple__")[1].strip("[]").replace("'", "").split(", "))
# if key is number in string representation
if all(entry.isdigit() for entry in serialized_key):
# convert to integer
serialized_key = tuple(map(int, serialized_key))
# replace old key with encoded key
obj_copy[serialized_key] = obj_copy.pop(key)
# if the key was modified
if serialized_key is not None:
# do it again for the next nested dictionary
obj_copy[serialized_key] = _string_to_tuple(obj[key])
# else, just do it for the next dictionary
else:
obj_copy[key] = _string_to_tuple(obj[key])
# if another instance was found
elif isinstance(obj, list):
for item in obj:
_string_to_tuple(item)
return obj_copy
Insert you custom logic for en-/decoding your instance by changing the
if isinstance(key, tuple):
# stringify the key
serialized_key = f"__tuple__{list(key)}"
in the _tuple_to_string
function or the corresponding code block from the _string_to_tuple
function, respectively:
if isinstance(key, str) and key.startswith("__tuple__"):
# decode it so tuple
serialized_key = tuple(key.split("__tuple__")[1].strip("[]").replace("'", "").split(", "))
# if key is number in string representation
if all(entry.isdigit() for entry in serialized_key):
# convert to integer
serialized_key = tuple(map(int, serialized_key))
Then, you can use it as usual:
>>> dct = {("L1", "L1"): {("L2", "L2"): "foo"}}
>>> json.dumps(_tuple_to_string(dct))
... {"__tuple__['L1', 'L2']": {"__tuple__['L2', 'L2']": "foo"}}
Hope, I could help you!
This something that CAN NOT BE DONE. That is, the default function in json or alternatively extending the JsonEncoder
approach will not work. See this issue:
https://github.com/python/cpython/issues/63020
The reason being that the developers thing that supporting anything other than strings for serialization should be disavowed.
See also:
json.dump not calling default or cls