Pydantic BaseModel schema and instance serialization/deserialization not working

Question:

I am attempting to serialize a Pydantic model schema and then deserialize it in another script. The serialization process is working as expected, and it has created two JSON files: model.json and data.json.

In test_save.py, I defined the MainModel schema and then serialized it along with an instance of MainModel. The resulting JSON files contain the schema and data, respectively.

test_save.py

from pydantic import BaseModel
import json

# Model definition
class MainModel(BaseModel):
    foo: str

# Serialize MainModel
with open('model.json', 'w') as f:
    json.dump(MainModel.schema(), f, indent=4)

# Create Instance of MainModel
maindata = MainModel(foo = 'bar')

# Serialize Model Instance
with open('data.json', 'w') as f:
    json.dump(maindata.dict(), f, indent=4)

model.json

{
    "title": "MainModel",
    "type": "object",
    "properties": {
        "foo": {
            "title": "Foo",
            "type": "string"
        }
    },
    "required": [
        "foo"
    ]
}

data.json

{
    "foo": "bar"
}

In test_load.py, I’m attempting to deserialize the model.json and data.json files. The create_model function from pydantic is used to define the MainModel schema based on the schema in model.json

test_load.py

import pydantic
import json

with open('model.json', 'r') as f:
    j = json.load(f)
    MainModel = pydantic.create_model('MainModel', **j)

    with open('data.json', 'r') as f:
        maindata = json.load(f)
        # modelinstance = MainModel.validate(maindata)
        modelinstance = MainModel.parse_obj(maindata)
        print(json.dumps(modelinstance.dict(), indent=4)) # This print the schema instead of data.

I’m attempting to deserialize a Pydantic model instance using the schema stored in model.json and data stored in data.json. However, when I run the script, instead of printing the data as expected, the script prints the schema from model.json.

The expected output from test_load.py is:

{
    "foo": "bar"
}

But the actual output is:

{
    "title": "MainModel",
    "type": "object",
    "properties": {
        "foo": {
            "title": "Foo",
            "type": "string"
        }
    },
    "required": [
        "foo"
    ]
}

I’m not sure what I’m doing wrong. Can anyone help me identify the issue?

Asked By: Andreas

||

Answers:

The create_model function from pydantic is used to define the MainModel schema based on the schema in model.json

That is the error.

I am not sure what made you think that this function takes a (dict-parsed) JSON schema as keyword arguments, but that is not how it works. To quote from the documentation of the create_model function:

Fields are defined by either a tuple of the form (<type>, <default value>) or just a default value.


What happened in your case is that the dictionary containing the model schema was unpacked as keyword arguments and thus it constructed a model with the fields named title, type, properties, and required.

The first two were just passed strings ("MainModel" and "object" respectively), which the model creation function interpreted as the default values, inferring the type of both those fields to be str. The properties argument ended up being a dictionary, again interpreted as the default value for a field of the type dict and the last one got a list, which was again set as the default value for the required field of type list.

You can verify this by printing MainModel.schema_json(indent=4) after your create_model call.

Lastly, since you therefore had default values for all four fields and the default config setting for extra attributes is Extra.ignore, parsing your data.json resulted in the foo key to just be ignored, whereas all other fields were assigned their default values. That is how you ended up with a model instance that looked exactly like the model schema.


If you want to generate Pydantic models from a JSON schema, there is no built-in functionality for this. But the Pydantic docs link the datamodel-code-generator package, which can do that (and more). But since this is code generation, it will not work at runtime. You’ll obviously need to call the code generator first before launching the program that attempts to use those models.

Answered By: Daniil Fajnberg

You can even create the model ‘on the fly’ and refer to it in your code:

from datamodel_code_generator import generate
from pathlib import Path

# saves model to file
generate(
    input_=Path("your_model_schema.json"),
    input_file_type="jsonschema",
    output=Path("your_model.py")
)

And for using it in your code:

import importlib

Model = getattr(importlib.import_module("your_model"), 'Model')

path = Path('data_file.json')
your_json_data = Model.parse_file(path)

Tested with Python 3.10.

Answered By: seniorNerd