Library for JSON serialization/deserialization of objects in Python

Question:

I’m looking for a python library to help with serialization/deserialization of pretty complicated objects in Python. I am writing a library where the top level object is very roughly (using dataclass syntax)

class ExampleProgram:
    component_1 : ComponentType
    widgets : Dict[str, WidgetType]
    foo : List[FooType]

The key thing here is that the three types are abstract base classes – I’m slightly abusing notation here to mean that component_1 can be any object that derives from ComponentType. ExampleProgram can take in any combination of inherited objects.

Most of what I’ve been looking at has been through pydantic. Pydantic nearly accomplishes this – essentially if I am able to know all possible ComponentType s in advance, I am able to define a Union type. More precisely, if I know that all possible subclasses of ComponentType are ComponentType1,...,ComponentTypeN , I am very able to define a Union type on these – and then Pydantic is happy to accept this. This gives me (I believe) the ability to serialize and deserialize – potentially adding in a discriminator field to the ComponentType s.

However, a key part of my library is that the user will be able to subclass Component themselves and experiment. For the purpose of their experimentation, I do not expect that a library will be able to help with serialization/deserialization however, the object needs to be able to be constructed with arbitrary subclassed components. Here is where pydantic falls down.

Ultimately my requirements can be spelled out as follows:

  1. A library that can help with serializing/deserializing objects where

    1. It can handle the fact that most class definitions are in terms of abstract base classes

    2. At the time of serializing/deserializing it can match up the serialized/deserialized form to the inherited class

  2. That library does not require too much structure, to the point that I can not instantiate objects otherwise.

I am sort of expecting that I will need to add extra data to the classes somewhere to guide the serialization/deserialization process, and I’m happy to add it wherever – but I don’t even know where to start looking for either a library or exactly what to do.

Asked By: PrehensileOwl

||

Answers:

It reads as though what you are looking for can be accomplished with Pydantic’s generic models. Say you want to allow users of your top-level model to define their own subclasses for three of the classes you use to annotate your model fields with. You could design that top-level model to be generic in terms of those three types and set the upper bounds on the type variables accordingly:

from typing import Generic, TypeVar

from pydantic import BaseModel
from pydantic.generics import GenericModel


class Component(BaseModel):
    x: int


class Widget(BaseModel):
    a: str


class Foo(BaseModel):
    bar: bool


C = TypeVar("C", bound=Component)
W = TypeVar("W", bound=Widget)
F = TypeVar("F", bound=Foo)


class Example(GenericModel, Generic[C, W, F]):
    component: C
    widgets: dict[str, W]
    foo: list[F]

For the sake of simplicity I am assuming that Component, Widget and Foo are all normal vanilla BaseModel subclasses.

If a user wants to (for example) use Widget and Foo for validation "as is", but play around with different subclasses of Component he wrote himself, he could define a generic alias type that fixes the 2nd and 3rd type argument, but keeps the 1st generic and then specify that only at the moment of actually parsing data.

Here is an example:

from typing import TypeVar

from pydantic import ValidationError

# ... import Component, Example, Foo, Widget


_C = TypeVar("_C", bound=Component)
CustomExample = Example[_C, Widget, Foo]


class SubComponent1(Component):
    y: int


class SubComponent2(Component):
    z: int


json_data = '''{
    "component": {"x": 0, "y": -1},
    "widgets": {},
    "foo": [
        {"bar": true},
        {"bar": false}
    ]
}'''
obj = CustomExample[SubComponent1].parse_raw(json_data)
print(obj)
try:
    CustomExample[SubComponent2].parse_raw(json_data)
except ValidationError as err:
    print(err.json(indent=4))

Output:

component=SubComponent1(x=0, y=-1) widgets={} foo=[Foo(bar=True), Foo(bar=False)]
[
    {
        "loc": [
            "component",
            "z"
        ],
        "msg": "field required",
        "type": "value_error.missing"
    }
]

As you can see, the validation logic is dependent upon the specified type arguments. In this example, specifying SubComponent2 leads to a validation error with that test data because field z is required, but a corresponding key is missing in the data.

There is of course no particular need for the user to create type variables and define type aliases ahead of time. You can just as easily specify all three type arguments at the same time, when parsing data:

obj = Example[SubComponent1, Widget, Foo].parse_raw(json_data)
print(obj)  # same output as the first one above

I am just assuming that the complexity might get a bit unwieldy, if the number of type parameters grows. But that is par for the course.

For the sake of completeness, it should be noted that keeping Example completely unspecified works too. Validation then defaults to the upper bounds of the type variables. But that is bad form in terms of type safety and should be avoided:

obj = Example.parse_raw(json_data)
print(obj)

Output:

component=Component(x=0) widgets={} foo=[Foo(bar=True), Foo(bar=False)]

Notice the y value is missing, even though it was present in our json_data. That is because the base Component is used for validation, which has no such field and by default Pydantic models just ignore additional values.


If you answer the questions I posted in my comments and clarify your requirements or if you can explain, where this approach may fall short for you, I’ll attempt to amend my answer here accordingly.


PS

If you want to mix in some of the abc magic with Component, Widget, Foo and so on, the simplest solution I can think of is to simply inherit from abc.ABC as well as from BaseModel, like shown in this section of the documentation. This will enforce your requirement of not allowing instantiation as long as the abstract methods are not implemented in subclasses.

Answered By: Daniil Fajnberg
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.