Why does converting a Numpy array to JSON fail while converting it to a string succeeds?

Question:

For the following code:

import json
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

normalString = str(arr)
print(normalString)

jsonString = json.dumps(arr)
print(jsonString)

Converting this to a string and then printing it works just fine; as expected, it prints out [1, 2, 3, 4, 5]. However, when I try to serialize it to JSON, I get TypeError: Object of type ndarray is not JSON serializable for the line where I’m calling json.dumps. Why isn’t a Numpy array "directly" serializable to JSON?

I’m aware of the fact that this can be circumvented by converting the Numpy array to a list before trying to serialize it. There are numerous tutorials and Q&As that explain that fact, such as the previously-linked Q&A as well as this one. However, none of the Q&As I’ve found so far explain why this is the case. The closest I’ve found so far is a comment on the latter post explaining that

numpy.ndarray is not a type that json knows how to handle. You’ll either need to write your own serializer, or (more simply) just pass list(your_array) to whatever is writing the json.

However, this comment doesn’t explain exactly why the json module doesn’t know how to handle this. All of the answers to the question explain how to correct the problem, not why it occurs in the first place.

Can someone explain why the json model doesn’t know how to handle this but the str function does? Again, I’m not asking how to fix this, as the linked Q&As already explain that.

Answers:

I’ll try. To answer your question you have to look at how standard Json encoder is implemented. It’s here https://github.com/python/cpython/blob/3.10/Lib/json/encoder.py
It certainly can be replaced/extended to include numpy.arrays encoding but the standard class doc says this:

class JSONEncoder(object):

"""Extensible JSON <http://json.org> encoder for Python data structures.
Supports the following objects and types by default:
+-------------------+---------------+
| Python            | JSON          |
+===================+===============+
| dict              | object        |
+-------------------+---------------+
| list, tuple       | array         |
+-------------------+---------------+
| str               | string        |
+-------------------+---------------+
| int, float        | number        |
+-------------------+---------------+
| True              | true          |
+-------------------+---------------+
| False             | false         |
+-------------------+---------------+
| None              | null          |
+-------------------+---------------+
To extend this to recognize other objects, subclass and implement a
``.default()`` method with another method that returns a serializable
object for ``o`` if possible, otherwise it should call the superclass
implementation (to raise ``TypeError``).
"""

Why only these classes on the left column are supported?
I can make a few guesses (in no particular order):

  1. Perfomance. Underneath (see function _make_iterencode) there is an iterative if isinstance(…), elif isinstance(…), elif isinstance(…), … . Numpy array is not a special object that is better than others. I could think of numpy.Matrix, torch.Tensor, pandas.DataFrame that could also need encoding. Various classes from collections like Defaultdict, Counter, NamedTuple. Adding all here?
  2. Irreversibility. Json encode/decode operations ideally should be bijective (reversible). Then how do you mark that it is a numpy array? Use str instead of list? Add a level with a json object where you add field ‘type’ and say "numpy.array" in it? Makes it ugly and hard to reason about.
  3. Complexity. Numpy arrays are for math operations and there is no guaruanty about what numpy arrays may contain. For example numpy arrays may contain complex numbers or quaternions. Those are not serializable by python json module. Even if you make them serializeble somehow there is no guarantee that other complicated types would not appear in numpy arrays in the future
  4. JSON Standard. Json is a standard format for serialization that is used by many programming languages and very common in web (see https://www.json.org/json-en.html). It has a fixed number of types. Why extending it here in python? For serialization of other objects there are other formats like pickle, dill, parquet and so on – just using what is best suited for a particular problem instead of messing up with a very common format is probably a better solution.

Hope this answers your question. Thank you!

Answered By: Nikolay Zakirov
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.