Convert numpy type to python
Question:
I have a list of dicts in the following form that I generate from pandas. I want to convert it to a json format.
list_val = [{1.0: 685}, {2.0: 8}]
output = json.dumps(list_val)
However, json.dumps throws an error: TypeError: 685 is not JSON serializable
I am guessing it’s a type conversion issue from numpy to python(?).
However, when I convert the values v of each dict in the array using np.int32(v) it still throws the error.
EDIT: Here’s the full code
new = df[df[label] == label_new]
ks_dict = json.loads(content)
ks_list = ks_dict['variables']
freq_counts = []
for ks_var in ks_list:
freq_var = dict()
freq_var["name"] = ks_var["name"]
ks_series = new[ks_var["name"]]
temp_df = ks_series.value_counts().to_dict()
freq_var["new"] = [{u: np.int32(v)} for (u, v) in temp_df.iteritems()]
freq_counts.append(freq_var)
out = json.dumps(freq_counts)
Answers:
It looks like you’re correct:
>>> import numpy
>>> import json
>>> json.dumps(numpy.int32(685))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python2.7/json/encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 685 is not JSON serializable
The unfortunate thing here is that numpy numbers’ __repr__
doesn’t give you any hint about what type they are. They’re running around masquerading as int
s when they aren’t (gasp). Ultimately, it looks like json
is telling you that an int
isn’t serializable, but really, it’s telling you that this particular np.int32 (or whatever type you actually have) isn’t serializable. (No real surprise there — No np.int32 is serializable). This is also why the dict that you inevitably printed before passing it to json.dumps
looks like it just has integers in it as well.
The easiest workaround here is probably to write your own serializer1:
class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, numpy.integer):
return int(obj)
elif isinstance(obj, numpy.floating):
return float(obj)
elif isinstance(obj, numpy.ndarray):
return obj.tolist()
else:
return super(MyEncoder, self).default(obj)
You use it like this:
json.dumps(numpy.float32(1.2), cls=MyEncoder)
json.dumps(numpy.arange(12), cls=MyEncoder)
json.dumps({'a': numpy.int32(42)}, cls=MyEncoder)
etc.
1Or you could just write the default function and pass that as the defaut
keyword argument to json.dumps
. In this scenario, you’d replace the last line with raise TypeError
, but … meh. The class is more extensible 🙂
If you leave the data in any of the pandas
objects, the library supplies a to_json
function on Series, DataFrame, and all of the other higher dimension cousins.
See Series.to_json()
You could also convert the array to a python list (use the tolist
method) and then convert the list to json.
You can use our fork of ujson to deal with NumPy int64. caiyunapp/ultrajson: Ultra fast JSON decoder and encoder written in C with Python bindings and NumPy bindings
pip install nujson
Then
>>> import numpy as np
>>> import nujson as ujson
>>> a = {"a": np.int64(100)}
>>> ujson.dumps(a)
'{"a":100}'
>>> a["b"] = np.float64(10.9)
>>> ujson.dumps(a)
'{"a":100,"b":10.9}'
>>> a["c"] = np.str_("12")
>>> ujson.dumps(a)
'{"a":100,"b":10.9,"c":"12"}'
>>> a["d"] = np.array(list(range(10)))
>>> ujson.dumps(a)
'{"a":100,"b":10.9,"c":"12","d":[0,1,2,3,4,5,6,7,8,9]}'
>>> a["e"] = np.repeat(3.9, 4)
>>> ujson.dumps(a)
'{"a":100,"b":10.9,"c":"12","d":[0,1,2,3,4,5,6,7,8,9],"e":[3.9,3.9,3.9,3.9]}'
If you have dict consists of multiple numpy objects like ndarray or a float32 object you can manually convert an ndarray to a list using .tolist()
import numpy as np
import json
a = np.empty([2, 2], dtype=np.float32)
json.dumps(a.tolist()) # this should work
or save a float32 object using .item()
.
import numpy as np
import json
a = np.float32(1)
json.dumps(a.item()) # this should work
But if you have a complex dict with multiple dicts nested in lists which are further nested with numpy objects, navigating your code and manually updating each variable become cumbersome and you might not want to do that. Instead you can define a NumpyEncoder class which handles this for you during the json.dumps() reference
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.float32):
return obj.item()
if isinstance(obj, np.ndarray):
return obj.tolist()
return json.JSONEncoder.default(self, obj)
with open('output.json', 'w') as outfile:
json.dump(json_dict, outfile, sort_keys=True, indent=4, cls=NumpyEncoder) # indent and sort_keys are just for cleaner output
This worked perfectly for me, this even allows us to handle any other data types when saving to JSON example, formatting the decimal places when saving.
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, float):
return "{:.2f}".format(obj)
return json.JSONEncoder.default(self, obj)
In some cases simple json.dump(eval(str(a)), your_file)
helps.
In a simpler case when you only have numpy numbers to be converted the easiest is:
json.dumps(a, default=float)
I have a list of dicts in the following form that I generate from pandas. I want to convert it to a json format.
list_val = [{1.0: 685}, {2.0: 8}]
output = json.dumps(list_val)
However, json.dumps throws an error: TypeError: 685 is not JSON serializable
I am guessing it’s a type conversion issue from numpy to python(?).
However, when I convert the values v of each dict in the array using np.int32(v) it still throws the error.
EDIT: Here’s the full code
new = df[df[label] == label_new]
ks_dict = json.loads(content)
ks_list = ks_dict['variables']
freq_counts = []
for ks_var in ks_list:
freq_var = dict()
freq_var["name"] = ks_var["name"]
ks_series = new[ks_var["name"]]
temp_df = ks_series.value_counts().to_dict()
freq_var["new"] = [{u: np.int32(v)} for (u, v) in temp_df.iteritems()]
freq_counts.append(freq_var)
out = json.dumps(freq_counts)
It looks like you’re correct:
>>> import numpy
>>> import json
>>> json.dumps(numpy.int32(685))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python2.7/json/encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 685 is not JSON serializable
The unfortunate thing here is that numpy numbers’ __repr__
doesn’t give you any hint about what type they are. They’re running around masquerading as int
s when they aren’t (gasp). Ultimately, it looks like json
is telling you that an int
isn’t serializable, but really, it’s telling you that this particular np.int32 (or whatever type you actually have) isn’t serializable. (No real surprise there — No np.int32 is serializable). This is also why the dict that you inevitably printed before passing it to json.dumps
looks like it just has integers in it as well.
The easiest workaround here is probably to write your own serializer1:
class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, numpy.integer):
return int(obj)
elif isinstance(obj, numpy.floating):
return float(obj)
elif isinstance(obj, numpy.ndarray):
return obj.tolist()
else:
return super(MyEncoder, self).default(obj)
You use it like this:
json.dumps(numpy.float32(1.2), cls=MyEncoder)
json.dumps(numpy.arange(12), cls=MyEncoder)
json.dumps({'a': numpy.int32(42)}, cls=MyEncoder)
etc.
1Or you could just write the default function and pass that as the defaut
keyword argument to json.dumps
. In this scenario, you’d replace the last line with raise TypeError
, but … meh. The class is more extensible 🙂
If you leave the data in any of the pandas
objects, the library supplies a to_json
function on Series, DataFrame, and all of the other higher dimension cousins.
See Series.to_json()
You could also convert the array to a python list (use the tolist
method) and then convert the list to json.
You can use our fork of ujson to deal with NumPy int64. caiyunapp/ultrajson: Ultra fast JSON decoder and encoder written in C with Python bindings and NumPy bindings
pip install nujson
Then
>>> import numpy as np
>>> import nujson as ujson
>>> a = {"a": np.int64(100)}
>>> ujson.dumps(a)
'{"a":100}'
>>> a["b"] = np.float64(10.9)
>>> ujson.dumps(a)
'{"a":100,"b":10.9}'
>>> a["c"] = np.str_("12")
>>> ujson.dumps(a)
'{"a":100,"b":10.9,"c":"12"}'
>>> a["d"] = np.array(list(range(10)))
>>> ujson.dumps(a)
'{"a":100,"b":10.9,"c":"12","d":[0,1,2,3,4,5,6,7,8,9]}'
>>> a["e"] = np.repeat(3.9, 4)
>>> ujson.dumps(a)
'{"a":100,"b":10.9,"c":"12","d":[0,1,2,3,4,5,6,7,8,9],"e":[3.9,3.9,3.9,3.9]}'
If you have dict consists of multiple numpy objects like ndarray or a float32 object you can manually convert an ndarray to a list using .tolist()
import numpy as np
import json
a = np.empty([2, 2], dtype=np.float32)
json.dumps(a.tolist()) # this should work
or save a float32 object using .item()
.
import numpy as np
import json
a = np.float32(1)
json.dumps(a.item()) # this should work
But if you have a complex dict with multiple dicts nested in lists which are further nested with numpy objects, navigating your code and manually updating each variable become cumbersome and you might not want to do that. Instead you can define a NumpyEncoder class which handles this for you during the json.dumps() reference
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.float32):
return obj.item()
if isinstance(obj, np.ndarray):
return obj.tolist()
return json.JSONEncoder.default(self, obj)
with open('output.json', 'w') as outfile:
json.dump(json_dict, outfile, sort_keys=True, indent=4, cls=NumpyEncoder) # indent and sort_keys are just for cleaner output
This worked perfectly for me, this even allows us to handle any other data types when saving to JSON example, formatting the decimal places when saving.
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, float):
return "{:.2f}".format(obj)
return json.JSONEncoder.default(self, obj)
In some cases simple json.dump(eval(str(a)), your_file)
helps.
In a simpler case when you only have numpy numbers to be converted the easiest is:
json.dumps(a, default=float)